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(57) Subject of the present invention is an isolated 
human nucleic acid molecule encoding polypeptides 
containing a homeobox domain of sixty amino acids 
having the amino acid sequence of SEQ ID NO: 1 and 
having regulating activity on human growth. 

Three novel genes residing within the about 500kb 
short stature critical region on the X and Y chromosome 
were identified. At least one of these genes is respon- 
sible for the short stature phenotype. The cDNA corre- 
sponding to this gene may be used in diagnostic tools, 
and to further characterize the molecular basis for the 
short statu re-phenotype. In addition, the identification of 
the gene product of the gene provides new means and 
methods for the development of superior therapies for 
short stature. 
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Description 

[0001] The present Invention relates to the isolation, identification and characterization of newly identified human 
genes responsible for disorders relating to human growth, especially for short stature or Turner syndrome, as well as 

5 the diagnosis and therapy of such disorders. 

[0002] The isolated genomic DNA or fragments thereof can be used for pharmaceutical purposes or as diagnostic 
tools or reagents for identification or characterization of the genetic defect involved in such disorders. Subject of the 
present invention are further human growth proteins (transcription factors A, B and C) which are expressed after tran- 
scription of said DNA into RNA or mRNA and which can be used in the therapeutic treatment of disorders related to 

io mutations in said genes. The invention further relates to appropriate cDNA sequences which can be used for the 
preparation of recombinant proteins suitable for the treatment of such disorders. Subject of the invention are further 
plasmid vectors forthe expression of the DNA of these genes and appropriate cells containing such DNAs. It is afurther 
subject of the present invention to provide means and methods for the genetic treatment of such disorders in the area 
of molecular medicine using an expression plasmid prepared by incorporating the DNA of this invention downstream 

15 from an expression promotor which effects expression in a mammalian host cell. 

[0003] Growth is one of the fundamental aspects in the development of an organism, regulated by a highly organised 
and complex system. Height is a multifactorial trait, influenced by both environmental and genetic factors. Develop- 
mental malformations concerning body height are common phenomena among humans of all races. With an incidence 
of 3 in 100, growth retardation resulting in short stature account for the large majority of inborn deficiencies seen in 

20 humans. 

[0004] With an incidence of 1 :2500 life-born phenotypic females, Turner syndrome is a common chromosomal dis- 
order (Rosenfeld et al., 1996). It has been estimated that 1-2% of all human conceptions are 45,X and that as many 
as 99 % of such fetuses do not come to term (Hall and Gilchrist, 1990; Robins, 1990). Significant clinical variability 
exists in the phenotype of persons with Turner syndrome (or Ullrich-Turner syndrome) (Ullrich, 1930; Turner, 1938). 

25 Short stature, however, is a consistent finding and together with gonadal dysgenesis considered as the lead symptoms 
of this disorder. Turner syndrome is a true multifactorial disorder. Both the embryonic lethality, the short stature, gonadal 
dysgenesis and the characteristic somatic features are thought to be due to monosomy of genes common to the X and 
Y chromosomes. The diploid dosis of those X-Y homologous genes are suggested to be requested for normal human 
development. Turner genes (or anti-Turner genes) are expected to be expressed in females from both the active and 

30 inactive X chromosomes or Y chromosome to ensure correct dosage of gene product. Haploinsufficiency (deficiency 
due to only one active copy), consequently would be the suggested genetic mechanism underlying the disease. 
[0005] A variety of mechanisms underlying short stature have been elucidated so far. Growth hormone and growth 
hormone receptor deficiencies as well as skeletal disorders have been described as causes forthe short stature phe- 
notype (Martial et al., 1979; Phillips etal., 1981; Leung etal., 1987; Goddard et al., 1995). Recently, mutations in three 

35 human fibroblast growth factor receptor-encoding genes (FGFR 1 -3) were identified as the cause of various skeletal 
disorders, including the most common form of dwarfism, achondroplasia (Shiang et al., 1994; Rousseau et al., 1994; 
Muenke and Schell, 1995). A well-known and frequent (1:2500 females) chromosomal disorder, Turner Syndrome 
(45,X), is also consistently associated with short stature. Taken together, however, all these different known causes 
account for only a small fraction of all short patients, leaving the vast majority of short stature cases unexplained to date. 

40 [0006] The sex chromosomes X and Y are believed to harbor genes influencing height (Ogata and Matsuo, 1993). 
This could be deduced from genotype-phenotype correlations in patients with sex chromosome abnormalities. Cytoge- 
netic studies have provided evidence that terminal deletions of the short arms of either the X or the Y chromosome 
consistently lead to short stature in the respective individuals (Zuffardi et al., 1982; Curry et al., 1984). More than 20 
chromosomal rearrangements associated with terminal deletions of chromosome Xp and Yp have been reported that 

45 . localize the gene(s) responsible for short stature to the pseudoautosomal region (PARI ) (Ballabio etal., 1989, Schaefer 
et at., 1993). This localisation has been narrowed down to the most distal 700 kb of DNA of the PARI region, with 
DXYS15as the flanking marker (Ogata et al., 1992; 1995). 

[0007] Mammalian growth regulation is organized as a complex system. It is conceivable that multiple growth pro- 
moting genes (proteins) interact with one another in a highly organized way. One of those genes controlling height has 
so tentatively been mapped to the pseudoautosomal region PARI (Ballabio et al., 1989), a region known to be freely 
exchanged between the X and Y chromosomes (for a review see Rappold, 1993). The entire PARI region is approx- 
imately 2,700kb. 

[0008] The critical region for short stature has been defined with deletion patients. Short stature is the consequence 
when an entire 700kb region is deleted or when a specific gene within this critical region is present in haploid state, is 
55 interrupted or mutated (as is the case with idiotypic short stature or Turner sydrome). The frequency of Turner's syn- 
drome is 1 in 2500 females worldwide; the frequency of this kind of idiopathic short stature can be estimated to be 1 
in 4.000 - 5.000 persons. Turner females and some short stature individuals usually receive an unspecific treatment 
with growth hormone (GH) for many years to over a decade although it is well known that they have normal GH levels 
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and GH deficiency is not the problem. The treatment of such patients is very expensive (estimated costs approximately 
30.000 USD p.a.). Therefore, the problem existed to provide a method and means for distinguishing short stature 
patients on the one side who have a genetic defect in the respective gene and on the other side patients who do not 
have any genetic defect in this gene. Patients with a genetic defect in the respective gene - either a complete gene 
5 deletion (as in Turner syndrome) or a point mutation (as in idiopathic short stature) - should be susceptible for an 
alternative treatment without human GH, which now can be devised. 

[0009] Genotype/phenotype correlations have supported the existence of a growth gene in the proximal part of Yq 
and in the distal part of Yp. Short stature is also consistently found in individuals with terminal deletions of Xp. Recently, 
an extensive search for male and female patients with partial monosomies of the pseudoautosomal region has been 

10 undertaken. On the basis of genotype-phenotype correlations, a minimal common region of deletion of 700 kb DNA 
adjacent to the telomere was determined (Ogata et al., 1 992; Ogata et al., 1 995). The region of interest was shown to 
lie between genetic markers DXYS20 (3cosPP) and DXYS15 (113D) and al! candidate genes for growth control from 
within the PARI region (e.g., the hemopoietic growth factor receptor a; CSF2RA) (Gough et al., 1 990) were excluded 
based on their physical location (Rappold et al., 1 992), That is, the genes were within the 700 kb deletion region of the 

is 2.700 kb PARI region. 

[0010] Deletions of the pseudoautosomal region (PARI) of the sex chromosomes were recently discovered in indi- 
viduals with short stature and subsequently a minimal common deletion region of 700 kb within PARI was defined. 
Southern blot analysis on DNA of patients AK and SS using different pseudoautosomal markers has identified an Xp 
terminal deletion of about 700 kb distal to DXYS15 (113D) (Ogata et al, 1992; Ogata et al, 1995). 

20 [0011] The gene region corresponding to short stature has been identified as a region of approximately 500 kb, 
preferably approximately 170 kb in the PARI region of the X and Y chromosomes. Three genes in this region have 
been identified as candidates for the short stature gene. These genes were designated SHOX (also referred to as 
SHOX93 or HOX93), (SHOX = short stature homeobox-containing gene), pET92 and SHOT (SHO X-like homeobox 
gene on chromosome three). The gene SHOX which has two separate splicing sites resulting in two variations (SHOX 

25 a and b) is of particular importance. In preliminary investigations, essential parts of the nucleotide sequence of the 
short stature gene could be analysed (SEQ ID No. 8). Respective exons or parts thereof could be predicted and iden- 
tified (e.g. exon I [G310]; exon II [ET93]; exon IV [G108]; pET92). The obtained sequence information could then be 
used for designing appropriate primers or nucleotide probes which hybridize to parts of the SHOX gene or fragments 
thereof. By conventional methods, the SHOX gene can then be isolated. By further analysis of the DNA sequence of 

30 the genes responsible for short stature, the nucleotide sequence of exons I - V could be refined (v. fig. 1 - 3). The gene 
SHOX contains a homeobox sequence (SEQ ID NO: 1 ) of approximately 1 80 bp (v. fig. 2 and fig. 3), starting from the 
nucleotide coding for amino acid position 117 (Q) to the nucleotide coding for amino acid position 176 (E), i.e. from 
CAG (440) to GAG (619). The homeobox sequence is identified as the homeobox-pET93 (SHOX) sequence and two 
point mutations have been found in individuals with short stature in a German (A1 ) and a Japanese patient by screening 

35 up to date 250 individuals with idiopahtic short stature. Both point mutations were found at the identical position and 
leading to a protein truncation at amino acid position 1 95, suggesting that there may exist a hot spot of mutation. Due 
to the fact that both mutations found, which lead to a protein truncation, are at the identical position, it is possible that 
a putative hot spot of recombination exisits with exon 4 (G1 08). Exon specific primers can therefore be used as indicated 
below, e.g. GCA CAG CCA ACC ACC TAG (for) or TGG AAA GGC ATC ATC CGT AAG (rev). 

40 [0012] The above-mentioned novel homeobox-containing gene, SHOX, which is located within the 170 kb interval, 
is alternatively spliced generating two proteins with diverse function. Mutation analysis and DNA sequencing were 
used to demonstrate that short stature can be caused by mutations in SHOX. 

[0013] The identification and cloning of the short stature critical region according to the present invention was per- 
formed as follows: Extensive physical mapping studies on 1 5 individuals with partial monosomy in the pseudoautosomal 

45 region (PARI) were performed. By correlating the height of those individuals with their deletion breakpoints a short 
stature (SS) critical region of approximately 700 kb was defined. This region was subsequently cloned as an overlapping 
cosmid contig using yeast artificial chromosomes (YACs) from PAR 1 (Ried et al., 1996) and by cosmid walking. To 
search for candidate genes for SS within this interval, a variety of techniques were applied to an approximately 600 
kb region between the distal end of cosmid 56G1 0 and the proximal end of 51 D1 1 . Using cDNA selection, exon trapping, 

50 and CpG island cloning, the two novel genes were identified. 

[0014] The position of the short stature critical interval could be refined to a smaller interval of 170 kb of DNA by 
characterizing three further specific individuals (GA, AT and RY), who were consistently short. To precisely localize the 
rearrangement breakpoints of those individuals, fluorescence in situ hybridization (FISH) on metaphase chromosomes 
was carried out using cosmids from the contig. Patient GA, with a terminal deletion and normal height, defined the 

55 distal boundary of the critical region (with the breakpoint on cosmid 110E3), and patient AT, with an X chromosome 
inversion and normal height, the proximal boundary (with the breakpoint on cosmid 34F5). The Y-chromosomal break- 
point of patient RY, with a terminal deletion and short stature, was also found to be contained on cosmid 34F5, sug- 
gesting that this region contains sequences predisposing to chromosome rearrangements. 
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[0015] The entire region, bounded by the Xp/Yp telomere, has been cloned as a set of overlapping cosmids. Fluo- 
rescence in situ hybridization (FISH) with cosmids from this region was used to study six patients with X chromosomal 
rearrangements, three with normal height and three with short stature. Genotype-phenotype correlations narrowed 
down the critical short stature interval to 270 kb of DNA or even less as 1 70 kb, containing the gene or genes with an 
5 important role in human growth. A minimal tiling path of six to eight cosmids bridging this interval is now available for 
interphase and metaphase FISH providing a valuable tool for diagnostic investigations on patients with idiopathic short 
stature. 

Brief Description of the Drawings 

10 

[0016] Figure 1 is a gene map of the SHOX gene including five exons which are identified as follows: exon I: G31 0, 
exon II: ET93, exon III: ET45, exon IV: G108 and exons Va and Vb, whereby exons Va and Vb result from two different 
splicing sites of the SHOX gene. Exon II and III contain the homeobox sequence of 180 nucleotides. 
[0017] Figures 2 and 3 are the nucleotide and predicted amino acid sequences of SHOXa and SHOXb: 

15 

SHOX a: The predicted start of translation begins at nucleotide 92 with the first in-frame stop codon (TGA) at 
nucleotides 968 - 970, yielding an open reading frame of 876 bp that encodes a predicted protein of 292 amino 
acids (designated as transcription factor A or SHOXa protein , respectively). An in-frame, 5'stop codon at nucleotide 
4, the start codon and the predicted termination stop codon are in bold. The homeobox is boxed (starting from 
20 amino acid position 117 (Q) to 176 (E), i.e. CAG thru GAG in the nucleotide sequence). The locations of introns 

are indicated with arrows. Two putative polyadenylation signals in the 3' untranslated region are underlined. 

SHOXb: An open reading frame of 876 bp exists from A in the first methionin at nucleotide 92 to the in-frame stop 
codon at nucleotide 767-769, yielding an open reading frame of 675 bp that encodes a predicted protein of 225 
25 amino acids (transcription factor B or SHOXb protein, respectively). The locations of introns are indicated with 

arrows. Exons l-IV are identical with SHOXa, exon V is specific for SHOX b. A putative polyadenylation signal in 
the 3' untranslated region is underlined. 

[0018] Figure 4 are the nucleotide and predicted amino acid sequence of SHOT The predicted start of translation 
30 begins at nucleotide 43 with the first in-frame stop codon (TGA) at nucleotides 613 - 615, yielding an open reading 
frame of 573 bp that encodes a predicted protein of 190 amino acids (designated as transcription factor C or SHOT 
protein, respectively). The homeobox is boxed (starting from amino acid position 11 (Q) to 70 (E), i.e. CAG thru GAG 
in the nucleotide sequence). The locations of introns are indicated with arrows. Two putative polyadenylation signals 
in the 3'untranslated region are underlined 
35 [0019] Figure5 gives the exon/intron organization of the human SHOX gene and the respective positions in the 
nucleotide sequence. 

Brief Description of the SEQ ID: 

40 [0020] 

SEQ ID NO. 1 : translated amino acid sequence of the homeobox domain (180 bp) 

SEQ ID NO. 2: exon II (ET93) of the SHOX gene 

SEQ ID NO. 3: exon I (G310) of the SHOX gene 
« SEQ ID NO. 4: exon III (ET45) of the SHOX gene 

SEQ ID NO. 5: exon IV (G108) of the SHOX gene 

SEQ ID NO. 6: exon Va of the SHOX gene 

SEQ ID NO. 7: exon Vb of the SHOX gene 

SEQ ID NO. 8: preliminary nucleotide sequence of the SHOX gene 
50 SEQ ID NO. 9: ET92 gene 

SEQ ID NO. 10: SHOXa sequence (see also fig. 2) 

SEQ ID NO. 11: transcription factor A (see also fig. 2) 

SEQ ID NO. 12: SHOXb sequence (see also fig. 3) 

SEQ ID NO. 13: transcription factor B (see also fig 3) 
55 SEQ ID NO. 14: SHOX gene 

SEQ ID NO. 15: SHOT sequence (see also fig. 4) 

SEQ ID NO. 16: transcription factor C (see also fig. 4) 
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[0021] Since the target gene leading to disorders in human growth (e.g. short stature region) was unknown prior to 
the present invention, the biological and clinical association of patients with this deletion could give insights to the 
function of this gene. In the present study, fluorescence in situ hybridization (FISH) was used to examine metaphase 
and interphase lymphocyte nuclei of six patients. The aim was to test all cosmids of the overlapping set for their utility 
5 as FISH probes and to determine the breakpoint regions in all four cases, thereby determining the minimal critical 
region for the short stature gene. 

[0022] Duplication and deletion of genomic DNA can be technically assessed by carefully controlled quantitative 
PCR or dose estimation on Southern blots or by using RFLPs. However, a particularly reliable method for the accurate 
distinction between single and double dose of markers is FISH, the clinical application of is presently routine. Whereas 

10 in interphase FISH, the pure absence or presence of a molecular marker can be evaluated, FISH on metaphase chro- 
mosomes may provide a semi-quantitative measurement of inter-cosmid deletions. The present inventor has deter- 
mined that deletions of about 1 0 kb (25% of signal reduction) can still be detected. This is of importance, as practically 
all disease genes on the human X chromosome have been associated with smaller and larger deletions in the range 
from a few kilobases to several megabases of DNA (Nelson et al., 1995). 

15 [0023] Subject of the present invention are therefore DNA sequences or fragments thereof which are part of the 
genes responsible for human growth (or for short stature, respectively, in case of genetic defects in these genes). 
Three genes responsible for human growth were identified: SHOX, pET92 and SHOT. DNA sequences or fragments 
of these genes, as well as the respective full length DNA sequences of these genes can be transformed in an appropriate 
vector and transfected into cells. When such vectors are introduced into cells in an appropriate way as they are present 

20 in healthy humans, it is devisable to treat diseases involved with short stature, i.e. Turners syndrome, by modem means 
of gene therapy. For example, short stature can be treated by removing the respective mutated growth genes respon- 
sible for short stature. It is also possible to stimulate the respective genes which compensate the action of the genes 
responsible for short stature, i.e. by inserting DNA sequences before, after or within the growth/short stature genes in 
order to increase the expression of the healthy allels. By such modifications of the genes, the growth/short stature 

25 genes become activated or silent, respectively. This can be accomplished by inserting DNA sequences at appropriate 
sites within or adjacent to the gene, so that these inserted DNA sequences interfere with the growth/short stature genes 
and thereby activate or prevent their transcription. It is also devisable to insert a regulatory element (e.g. a promotor 
sequence) before said growth genes to stimulate the genes to become active. It is further devisable to stimulate the 
respective promotor sequence in order to overexpress - in the case of Turner syndrome - the healthy functional allele 

30 and to compensate for the missing allele. The modification of genes can be generally achieved by inserting exogenous 
DNA sequences into the growth gene / short stature gene via homologous recombination. 

[0024] The DNA sequences according to the present invention can also be used for transformation of said sequences 
into animals, such as mammals, via an appropriate vector system. These transgenic animals can then be used for in 
vivo investigations for screening or identifying pharamceutical agents which are useful in the treatment of diseases 
35 involved with short stature. If the animals positively respond to the administration of a candidate compound or agent, 
such agent or compound or derivatives thereof would be devisable as pharmaceutical agents. By appropriate means, 
the DNA sequences of the present invention can also be used in genetic experiments aiming at finding methods in 
order to compensate for the loss of genes responsible for short stature (knock-out animals). 

[0025] In a further object of this invention, the DNA sequences can also be used to be transformed into cells. These 
40 cells can be used for identifying pharmaceutical agents useful for the treatment of diseases involved with short stature, 
or for screening of such compounds or library of compounds. In an appropriate test system, variations in the phenotype 
or in the expression pattern of these cells can be determined, thereby allowing the identification of interesting candidate 
agents in the development of pharmaceutical drugs. 

[0026] The DNA sequences of the present invention can also be used for the design of appropriate primers which 
45 hybridize with segments of the short stature genes or fragments thereof under stringent conditions. Appropriate primer 
sequences can be constructed which are useful in the diagnosis of people who have a genetic defect causing short 
stature. In this respect it is noteworthy that the two mutations found occur at the identical position, suggesting that a 
mutational hot spot exists. 

[0027] In general, DNA sequences according to the present invention are understood to embrace also such DNA 
so sequences which are degenerate to the specific sequences shown, based on the degeneracy of the genetic code, or 
which hybridize under stringent conditions with the specifically shown DNA sequences. 
[0028] The present invention encompasses especially the following aspects: 

a) An isolated human nucleic acid molecule encoding polypeptides containing a homeobox domain of sixty amino 
55 acids having the amino acid sequence of SEQ ID NO: 1 and having regulating activity on human growth. 

b) An isolated DNA molecule comprising the nucleotide sequence essentially as indicated in fig. 2, fig. 3 or fig. 4, 
and especially as shown in SEQ ID NO: 10, SEQ ID NO: 12 or SEQ ID NO: 15. 

c) DNA molecules capable of hybridizing to the DNA molecules of item b). 
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d) DNA molecules of item c) above which are capable of hybridization with the DNA molecules of item 2. under a 
temperature of 60 - 70 °C and in the presence of a standard buffer solution. 

e) DNA molecules comprising a nucleotide sequence having a homology of seventy percent or higher with the 
nucleotide sequence of SEQ ID NO: 10, SEQ ID NO: 12 or SEQ ID NO: 15and encoding a polypeptide having 

5 regulating activity on human growth. 

f) Human growth proteins having the amino acid sequence of SEQ ID NO: 1 1 , 1 3 or 1 6 or a functional fragment 
thereof. 

g) Antibodies obtained from immunization of animals with human growth proteins of item f) or antigenic variants 
thereof. 

w h) Pharmaceutical compositions comprising human growth proteins or functional fragments thereof for treating 

disorders caused by genetic mutations of the human growth gene. 

i) A method of screening for a substance effective for the treatment of disorders mentioned above under item h) 
comprising detecting messenger RNA hybridizing to any of the DNA molecules decribed in a) - e) so as to measure 
any enhancement in the expression levels of the DNA molecule in response to treatment of the host cell with that 
15 substance. 

j) An expression vector or plasmid containing any of the nucleic acid molecules described in a) - e) above which 
enables the DNA molecules to be expressed in mammalian cells. 

k) A method for the determination of the gene or genes responsible for short stature in a biological sample of body 
tissues or body fluids. 

20 

[0029] In the method k) above, preferably nucleotide amplification techniques, e.g. PGR, are used for detecting 
specific nucleotide sequences known to persons skilled in the art, and described, for example, by Mullis et al. 1986, 
Cold Spring Harbor Symposium Quant. Biol. 51, 263-273, and Saiki et al., 1988, Science 239, 487-491, which are 
incorporated herein by reference. The short stature nucleotide sequences to be determined are mainly those repre- 
ss sented by sequences SEQ ID No. 2 to SEQ ID No. 7. 

[0030] In principle, all oligonucleotide primers and probes for amplifying and detecting a genetic defect responsible 
for deminished human growth in a biological sample are suitable for amplifying a target short stature associated se- 
quence. Especially, suitable exon specific primer pairs according to the invention are provided by table 1 . Subsequently, 
a suitable detection, e.g. a radioactive or non-radioactive label is carried out. 

30 



35 



40 



45 



50 
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Table 1: 



Exon 


Sense primer 


Antisense primer 


Product (bp) 


Ta (°C) 


5'-I(G310) 


SP 1 


ASP 1 


194 


58 


3'-I(G310) 


SP2 


ASP 2 


295 


58 


n(ET93) 


SP 3 


ASP 3 


262 


76/72/68 


m (ET45) 


SP 4 


ASP 4 


120 


65 


IV(G108) 


SP 5 


ASP 5 


154 


62 


Va(SHOXa) 


SP 6 


ASP 6 


265 


61 



explanation of the abbreviations for the primers: 



SP 1 


ATTTCCAATGGAAAGGCGTAAATAAC 


SP2 : 


ACGGCTTTTGTATCCAAGTC l'lTl G 


SP3 : 


GCCCTGTGCCCTCCGCTCCC 


SP4 : 


GGCTCTTCACATCTCTCTCTGCTTC 


SP5 


CCACACTGACACCTGCTCCCTTTG 


SP6 


CCCGCAGGTCCAGGCTCAGCTG 


ASP1 : 


CGCCTCCGCCGTTACCGTCCTTG 


ASP 2 : 


CCCTGGAGCCGGCGCGCAAAG 


ASP 3 : 


CCCCGCCCCCGCCCCCGG 


ASP 4 : 


CTTCAGGTCCCCCCAGTCCCG 


ASP 5 : 


CTAGGGATCTTCAGAGGAAGAAAAAG 


ASP 6 : 


GCTGCGCGGCGGGTCAGAGCCCCAG 



[0031] Also, a single stranded RNA can be used as target. Methods for reversed transcribing RNA into cDNA are 
also well known and described in Sambrook et al. p Molecular Cloning: A Laboratory Manual, New York, Cold Spring 
Harbor Laboratory 1 989. Alternatively, preferred methods for reversed transcription utilize thermostable DNA polymer- 
ases having RT activity. 

[0032] Further, the technique described before can be used for selecting those person from a group of persons being 
of short stature characterized by a genetic defect and which allows as a consequence a more specific medical treatment. 
[0033] In another subject of the present invention, the transcription factors A, B and C can be used as pharmaceutical 
agents. These transcription factors initiate a still unknown cascade of biological effects on a molecular level involved 
with human growth. These proteins or functional fragments thereof have a mitogenic effect on various cells. Especially, 
they have an osteogenic effect. They can be used in the treatment of bone diseases, such as e.g. osteoporosis, and 
especially all those diseases involved with disturbance in the bone calcium regulation. 

[0034] As used herein, the term "isolated" refers to the original derivation of the DNA molecule by cloning. It is to be 
understood however, that this term is not intended to be so limiting and, in fact, the present invention relates to both 
naturally occurring and synthetically prepared seqences, as will be understood by the skilled person in the art. 
[0035] The DNA molecules of this invention may be used in forms of gene therapy involving the use of an expression 
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plasmid prepared by incorporating an appropriate DNA sequence of this invention downstream from an expression 
promotor that effects expression in a mammalian host cell. Suitable host cells are procaryotic or eucaryotic cells. 
Procaryotic host cells are, for example, E. coli, Bacillus subtilis, and the like. By transfecting host cells with replicons 
originating from species adaptable to the host, that is, plasmid vectors containing replication starting point and regulator 

5 sequences, these host cells can be transfected with the desired gene or cDNA. Such vectors are preferably those 
having a sequence that provides the transfected cells with a property (phenotype) by which they can be selected. For 
example, for E. coli hosts the strain E. coli K12 is typically used, and for the vector either pBR322 or pUC plasmids 
can be generally employed. Examples for suitable promotors for E. coli hosts are trp promotor, lac promotor or Ipp 
promotor. If desired, secretion of the expression product through the cell membrane can be effected by connecting a 

10 DNA sequence coding for a signal peptide sequence at the 5' upstream side of the gene. Eucaryotic host cells include 
cells derived from vertebrates or yeast etc.. As a vertebrate host cell, COS cells can be used (Cell, 1981 , 23: 175 - 
182), or CHO cells. Preferably, promotors can be used which are positioned 5' upstream of the gene to be expressed 
and having RNA splicing positions, polyadenylation and transcription termination seqences. 
[0036] The transcription factors A, B and C of the present i nvention can be used to treat disorders caused by mutations 

is in the human growth genes and can be used as growth promoting agents. Due to the polymorphism known in the case 
of eukaryotic genes, one or more amino acids may be substituted. Also, one or more amino acids in the polypeptides 
can be deleted or inserted at one or more sites in the amino acid sequence of the polypeptides of SEQ ID NO: 11,13 
or 16. Such polypeptides are generally referred to equivalent polypeptides as long as the underlying biological acitivity 
of the unmodified polypeptide remains essentially unchanged. 

20 [0037] The present invention is illustrated by the following examples. 

Example 1 
Patients 

25 

[0038] All six patients studied had de novo sex chromosome aberrations. 

[0039] CC is a girl with a karyotype 45,X/46,X psu die (X) (Xqter -> Xp22.3::Xp22.3 Xqter). At the last examination 
at 6 1/2 years of age, her height was 114 cm (25 - 50 the % percentile). Her mother's height was 155 cm, the father 
was not available for analysis. For details, see Henke et al., 1991 . 
30 [0040] GA is a girl with a karyotype 46,X der X (3pter -» 3p23::Xp22.3 -> Xqter). At the last examination at 1 7 years, 
normal stature (1 59 cm) was observed. Her mother's height is 1 60 cm and her father's height 1 82 cm. For details, see 
Kulharya et al, 1995. 

[0041] SS is a girl with a karyotype 46,X rea (X) (Xqter -> Xq26 :: Xp22.3 -> Xq26:). At 11 years her height remained 
below the 3rd percentile growth curve for Japanese girls; her predicted adult height (148.5 cm) was below her target 
35 height (163 cm) and target range (155 to 191 em). For details, see Ogata et alt, 1992. 

[0042] AK is a girl with a karyotype 46.X rea (X) (Xqter ->Xp22.3::Xp22.3 ->Xp21 .3:). At 13 years her height remained 
below the 2nd percentile growth curve for Japanese girls; her predicted adult height (142.8 cm) was below her target 
height (155.5 cm) and target range (147.5 - 163.5 em). For details, see Ogata et alt, 1995. 

[0043] RY: the karyotype of the ring Y patient is 46,X,r(Y)/46,Xdic r(Y)/45,X[95:3:2], as examined on 1 00 lymphocytes; 
40 at 16 years of age his final height was 148; the heights of his three brothers are all in the normal range with 170 cm 
(16 years, brother 1), 164 cm (14 years, brother 2) and 128 cm (9 years, brother 3), respectively. Growth retardation 
of this patient is so severe that it would also be compatible with an additional deletion of the GCY locus on Yq. 
[0044] AT: boy with ataxia and inv(X); normal height of 116 cm at age 7, parents' heights are 156 cm and 190 cm, 
respectively. 

45 

Patients for mutation analysis: 

[0045] 250 individuals with idiopathic short stature were tested for mutations in SHOXa. The patients were selected 
on the following criteria: height for chronological age was below the 3rd centile of national height standards, minus 2 
so standard deviations (SDS); no causative disease was known, in particular: normal weight (length) for gestational age, 
normal body proportions, no chronic organic disorder, normal food intake, no psychiatric disorder, no skeletal dysplasia 
disorder, no thyroid or growth hormone deficiency. 

Family A: 

55 

[0046] Cases 1 and 2 are short statured children of a German non-consanguineous family. The boy (case 1) was 
born at the 38th week of gestation by cesarian section. Birth weight was 2660 g, birth length 47 cm. He developed 
normally except for subnormal growth. On examination at the age of 6.4 years, he was proportionate small (1 06.8 cm, 
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-2.6 SDS) and obese (227 kg), but otherwise normal. His bone age was not retarded (6 yrs) and bone dysplasia was 
excluded by X-ray analysis. IGF-I and IGFBP-3 levels as well as thyroid parameters in serum rendered GH or thyroid 
hormone deficiency unlikely. The girl (case 2) was born at term by cesarian section. Birth weight was 2920 g, birth 
length 47 cm. Her developmental milestones were normal, but by the age of 12 months poor growth was apparent 

5 (length: 67 cm, -3.0 SDS). At 4 years she was 89.6 cm of height (-3.6 SDS). No dysmorphic features or dysproportions 
were apparent. She was not obese (13 kg). Her bone age was 3.5 years and bone dysplasia was excluded. Hormone 
parameters were normal. It is interesting to note that both the girl and the boy grow on the 50 percentile growth curve 
for females with Turner syndrome. The mother is the smallest of the family and has a mild rhizomelic dysproportion 
(142.3 cm, -3.8 SDS). One of her two sisters (150 cm, -2.5 SDS) and the maternal grandmother (153 cm, -2.0 SDS) 

10 are all short without any dysproportion. One sister has normal stature (1 67 cm, +0.4 SDS)«. The father's height is 166 
cm (-1 .8 SDS) and the maternal grandfather 1 height is 165 cm (-1.9 SDS). The other patient was of Japanese origin 
and showed the identical mutation. 

Example 2 

15 

Identification of the short stature gene 

A. In situ hybridization 

20 a) Florescence in situ hybridization (FISH) 

[0047] Florescence in situ hybridization (FISH) using cosmids residing in the Xp/Yp pseudoautosomal region (PARI ) 
was carried out. FISH studies using cosmids 64/75cos (LLNLc110H032), E22cos (2e2), F1/14cos (110A7), M1/70cos 
(110E3), P99F2cos (43C11), P99cos (LLNLc110P2410), B6cosb (1CRFc104H0425), F20cos (34F5), F21cos 

25 (ICRFc104G0411), F3cos2 (9E3), F3cosl (11E6), P117cos (29B11), P6cosl (ICRFc104P011 7), P6cos2 
(LLNLc110E0625) and E4cos (15G7) was carried out according to published methods (Lichter and Cremer, 1992). In 
short, one microgram of the respective cosmid clone was labeled with biotin and hybridized to human metaphase 
chromosomes under conditions that suppress signals from repetitive DNA sequences. Detection of the hybridization 
signal was via FITC-conjugated avidin. Images of FITC were taken by using a cooled charge coupled device camera 

30 system (Photometries, Tucson, AZ). 

b) Physical mapping 

[0048] Cosmids were derived from Lawrence Livermore National Laboratory X- and Y-chromosome libraries and the 
35 Imperial Cancer Research Fund London (now Max Planck Institute for Molecular Genetics Berlin) X chromosome 
library. Using cosmids distal to DXYS15, namely E4cos, P6cos2, P6cosl, P1 1 7cos and F3cosl one can determine that 
two copies are still present of E4cos, P6cos2, P6cos1 and one copy of P117cos and F3cos1. Breakpoints of both 
patients AK and SS map on cosmid P6cos1, with a maximum physical distance of 10 kb from each other. It was 
concluded that the abnormal X chromosomes of AK and SS have deleted about 630 kb of DNA. 
40 [0049] Further cosmids were derived from the ICRFX chromosome specif iccosmid library (ICRFd 04), the Lawrence 
Livermore X chromosome specific cosmid library (LLNLc110) and the Y chromosome specific library (LLC03'M'), as 
well as from a self-made cosmid library covering the entire genome. Cosmids were identified by hybridisation with all 
known probes mapping to this region and by using entire YACs as probes. To verify overlaps, end probes from several 
cosmids were used in cases in which overlaps could not be proven using known probes. 

45 

c) Southern Blot Hybridisation 

[0050] Southern blot analysis using different pseudoautosomal markers has provided evidence that the breakpoint 
on the X chromosome of patient CC resides between DXYS20 (3cosPP) and DXYS60 (U7A) (Henke et al, 1991). In 
so order to confirm this finding and to refine the breakpoint location, cosmids 64/75cos, E22cos, F1/14cos, M1/70cos, 
F2cos, P99F2cos and P99cos were used as FISH probes. The breakpoint location on the abnormal X of patient CC 
between cosmids 64/75cos (one copy) and F1/14cos (two copies) on the E22PAC could be determined. Patient CC 
with normal stature consequently has lost approximately 260-290 kb of DNA. 

[0051] Southern blot hybridisations were carried out at high stringency conditions in Church buffer (0.5 M NaPi pH 
55 7.2, 7% SDS, 1mM EDTA) at 65°C and washed in 40 mM NaPi, 1% SDS at 65°C. 
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d) FISH Analysis 

[0052] Biotinylated cosmid DNA (insert size 32 - 45 kb) or cosmid fragments (1 0 - 1 6 kb) were hybridised to metaphase 
chromosomes from stimulated lymphocytes of patients under conditions as described previously (Lichter and Cremer, 
5 1 992). The hybridised probe was detected via avidin-conjugated FITC. 

e) PCR Amplification 

[0053] All PCRs were performed in 50 \i\ volumes containing 1 00 pg-200 ng template, 20 pmol of each primer, 200 
10 jxM dNTP's (Pharmacia), 1.5 mM MgCI 2> 75 mMTris/HCI pH9, 20mM (NH^SO^ 0.01% (w/v) Tween20 and 2 U of 
Goldstar DNA Polymerase (Eurogentec). Thermal cycling was carried out in a Thermocycler GeneE (Techne). 

f) Exon Amplification 

15 [0054] Four cosmid pools consisting of each four to five clones from the cosmid contigs were used for exon ampli- 
fication experiments. The cosmids in each cosmid pool were partially digested with Sau3A. Gel purified fractions in 
the size range of 4-10 kb were cloned in the BamHI digested pSPL3B vector (Bum et al, 1995) and used for the exon 
amplification experiments as previously described (Church et al., 1994). 

20 g) Genomic Sequencing 

[0055] Sonificated fragments of the two cosmids LLOYNC03'M' 15D10 and LLOYNC03'M'34F5 were subcloned 
separately into M13mp18 vectors. From each cosmid library at least 1000 plaques were picked, M13 DNA prepared 
and sequenced using dye-terminators, ThermoSequenase (Amersham) and universal M13-primer (MWG-BioTech). 

25 The gels were run on ABI-377 sequencers and data were assembled and edited with the GAP4 program (Staden). 
[0056] Of all six patients, GA had the least well characterized chromosomal breakpoint. The most distal markers 
previously tested for their presence or absence on the X were DXS1 060 and DXS996, which map approximately 6 Mb 
from the telomere (Nelson etal., 1995). Several cosmids containing different gene sequences from within PARI (MIC2, 
ANT3, CSF2RA, and XE7) were tested and all were present on the translocation chromosome. Cosmids from within 

30 the short stature critical region e.g., chromosome, thereby placing the translocation breakpoint on cosmid M1/70cos. 
A quantitative comparison of the signal intensities of M1/70cos between the normal and the rearranged X indicates 
that approximately 70% of this cosmid is deleted. 



TABLE 2 
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F21cos 










F3cos2 










F3cosl 










P117cos 










P6cosl 
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P6cos2 






+ 


+ 
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TABLE 2 (continued) 



CC 



GA 



AK 



SS 



E4cos 



Table 2: This table summarizes the FISH data for the 16 cosmids tested on four patients. 



[ - ] one copy; indicates that the respective cosmid was deleted on the rearranged X, but present on the normal 
X chromosome 

[ + ] two copies; indicates that the respective cosmid is present on the rearranged and on the normal X chro- 
mosome 

[(+)] breakpoint region; indicates that the breakpoint occurs within the cosmid as shown by FISH 



15 



20 



25 



30 



[0057] In summary, the molecular analysis on six patients with X chromosomal rearrangements using florescence- 
labeled cosmid probes and in situ hybridization indicates that the short stature critical region can be narrowed down 
to a 270 kb interval, bounded by the breakpoint of patient GA from its centromere distal side and by patients AK and 
SS on its centromere proximal side. 

[0058] Genotype-phenotype correlations may be informative and have been chosen to delineate the short stature 
critical interval on the human X and Y chromosome. In the present study FISH analysis was used to study metaphase 
spreads and interphase n uclei of lymphocytes from patients carrying deletions and translocations on the X ch romosome 
and breakpoints within Xp22.3. These breakpoints appear to be clustered in two of the four patients (AK and SS) 
presumably due to the presence of sequences predisposing to chromosome rearrangements. One additional patient 
Ring Y has been found with an interruption in the 270 kb critical region, thereby reducing the critical interval to a 170 
kb region. 

[0059] By correlating the height of all six individuals with their deletion breakpoint, an interval of 1 70 kb was mapped 
to within the pseudoautosomal region, presence or absence of which has a significant effect on stature. This interval 
is bounded by the X chromosomal breakpoint of patient GA at 340 kb from the telomere (Xptel) distally and by the 
breakpoints of patients AT and RY at 51 0/520 kb Xptel proximally. This assignment constitutes a considerable reduction 
of the critical interval to almost one fourth of its previous size (Ogata et al., 1992; Ogata et al., 1995). A small set of 
six to eight cosmids are now available for FISH experiments to test for the prevalence and significance of this genomic 
locus on a large series of patients with idiopathic short stature. 



B. Identification of the Candidate Short Stature Gene 



[0060] To search for transcription units within the smallest 1 70 kb critical region, exon trapping and cDNA selection 
35 on six cosmids (110E3, F2cos, 43C11, P2410, 15D10, 34F5) was carried out. Three different positive clones (ET93, 
ET45 and G108) were isolated by exon trapping, all of which mapped back to cosmid 34F5. Previous studies using 
cDNA selection protocols and an excess of 25 different cDNA libraries had proven unsuccessful, suggesting that genes 
in this interval are expressed at very low abundancy. 

[0061] To find out whether any gene in this interval was missed, the nucleotide sequence of about 140 kb from this 
40 region of the PARI was determined, using the random M13 method and dye terminator chemistry. The cosmids for 
sequence analysis were chosen to minimally overlap with each other and to collectively span the critical interval. DNA 
sequence analysis and subsequent protein prediction by the "X Grail" program, version 1 .3c as well as by the exon- 
trapping program FEXHBwere carried out and confirmed all 3 previously cloned exons. No protein -coding genes other 
than the previously isolated one could be detected. 

45 

C. Isolation of the Short Stature Candidate Gene SHOX 



[0062] Assuming that all three exon clones ET93, ET45 and G1 08 are part of the same gene, they were used col- 
lectively as probes to screen 14 different cDNA libraries from 12 different fetal (lung, liver, brain 1 and 2) and adult 

50 tissues (ovary, placenta 1 and 2, fibroblast, skeletal muscle, bone marrow, brain, brain stem, hypothalamus, pituitary). 
Not a single clone among approximately 14 million plated clones was detected. To isolate the full-length transcript, 3' 
and 5'RACE were carried out. For 3'RACE, primers from exon G1 08 were used on RNA from placenta, skeletal muscle 
and bone marrow fibroblasts, tissues where G1 08 was shown to be expressed in. Two different 3'RACE clones of 1 1 73 
and 652 bp were derived from all three tissues, suggesting that two different 3'exons a and b exist. The two different 

55 forms were termed SHOXa and SHOXb. 

[0063] To increase chances to isolate the complete 5'portion of a gene known to be expressed at low abundancy, a 
Hela cell line was treated with retinoic acid and phorbol ester PMA. RNA from such an induced cell line and RNA from 
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placenta and skeletal muscle were used for the construction of a 'Marathon cDNA library 1 . Identical 5'RACE cDNA 
clones were isolated from all three tissues. 

Experimental procedure: 

5 

RT-PCR and cDNA Library Construction 

[0064] Human polyA + RNA of heart, pancreas, placenta, skeletal muscle, fetal kidney and liver was purchased from 
Clontech. Total RNA was isolated from a bone marrow fibroblast cell line with TRIZOL reagent (Gibco-BRL) as described 
10 by the manufacturer. First strand cDNA synthesis was performed with the Superscript first strand cDNA synthesis kit 
(Gibco-BRL) starting with 100 ng polyA+RNA or 10u,g total RNA using oligo(dt)-adapter primer (GGCCACGCGTC- 
GACTAGTAC[dT] 20 N. After first strand cDNA synthesis the reaction mix was diluted 1/10. For further PCR experiments 
5u,l of this dilutions were used. 

[0065] A 'Marathon cDN A library' was constructed from skeletal muscle and placenta polyA + RNA with the marathon 
15 cDNA amplification kit (Clontech) as described by the manufacturer. 

[0066] Fetal brain (catalog # HL5015b), fetal lung (HL3022a), ovary (HL1098a), pituitary gland (HL1097v) and hy- 
pothalamus (HL1 1 72b) cDNA libraries were purchased from Clontech. Brain, kidney, liver and lung cDNA libraries were 
part of the quick screen human cDNA library panel (Clontech). Fetal muscle cDNA library was obtained from the UK 
Human Genome Mapping Project Resource Center. 

20 

D. Sequence Analysis and Structure of SHOX Gene 

[0067] A consensus sequence of SHOXa and SHOXb (1 349 and 1 870 bp) was assembled by analysis of sequences 
from the 5' and 3'RACE derived clones. A single open reading frame of 1870 bp (SHOXa) and 1349 bp (SHOXb) was 

25 identified, resulting in two proteins of 292 (SHOXa) and 225 amino acids (SHOXb). Both transcripts a and b share a 
common 5'end, but have a different last 3'exon, a finding suggestive of the use of alternative splicing signals. A complete 
alignment between the two cDNAs and the sequenced genomic DNA from cosmids LL0YNCO3 ,, M"15D10 and 
LL0YNC3 n M"34F5 was achieved, allowing establishment of the exon-intron structure (Fig.4). The gene is composed 
of 6 exons ranging in size from 58 bp (exon III) to 1146 bp (exon Va). Exon I contains a CpG-island, the start codon 

30 and the 5' region. A stop codon as well as the 3'-noncoding region Is located in each of the alternatively spliced exons 
Va and Vb. 

Example 3 

35 [0068] Two cDNAs have been identified which map to the 1 60 kb region identified as critical for short stature. These 
cDNAs correspond to the genes SHOX and pET92. The cDNAs were identified by the hybridization of subclones of 
the cosmids to cDNA libraries. 

[0069] Employing the set of cosmid clones with complete coverage of the critical region has now provided the genetic 
material to identify the causative gene. Positional cloning projects aimed at the isolation of the genes from this region 

40 are done by exon trapping and cDNA selection techniques. By virtue of their location within the pseudoautosomal 
region, these genes can be assumed to escape X-inactivation and to exert a dosage effect. 
[0070] The cloning of the gene leading to short stature when absent (haploid) or deficient, represents a further step 
forward in diagnostic accuracy, providing the basis for mutational analysis within the gene by e.g. single strand con- 
formation polymorphism (SSCP). In addition, cloning of this gene and its subsequent biochemical characterization has 

45 opened the way to a deeper understanding of biological processes involved in growth control. 

[0071] The DNA sequences of the present invention provide a first molecular test to identify individuals with a specific 
genetic disorder within the complex heterogeneous group of patients with idiopathic short stature. 

Example 4 

50 

Expression Pattern of SHOXa and SHOXb 

[0072] Northern blot analysis using single exons as hybridisation probes reveiled a different expression profile for 
every exon, strongly suggesting that the bands of different size and intensities represent cross-hybridisation products 
55 to other G,C rich gene sequences. To achieve a more realistic expression profile of both genes SHOXa and b, RT-PCR 
experiments on RNA from different tissues were carried out. Whereas expression of SHOXa was observed in skeletal 
muscle, placenta, pancreas, heart and bone marrow fibroblasts, expression of SHOXb was restricted to fetal kidney, 
skeletal muscle and bone marrow fibroblasts, with the far highest expression in bone marrow fibroblasts. 
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[0073] The expression of SHOXa in several cDNA libraries made of fetal brain, lung and muscle, of adult brain, lung 
and pituitary and of SHOXb in none of the tested libraries gives additional evidence that one spliced form (SHOXa) is 
more broadly expressed and the other (SHOXb) expressed in a predominantly tissue-specific manner. 
[0074] To assess the transcriptional activity of SHOXa and SHOXb on the X and Y chromosome we used RT-PCR 
5 of RNA extracted from various cell lines containing the active X, the inactive X or the Y chromosome as the only human 
chromosomes. All cell lines revealed an amplification product of the expected length of 119 bp (SHOXa) and 541 bp 
(SHOXb), providing clear evidence that both SHOXa and b escape X-in activation. 

[0075] SHOXa and SHOXb encode novel homeodomain proteins. SHOX is highly conserved across species from 
mammalian to fish and flies. The very 5 1 end and the very 3' end - besides the homeodomain- are likely conserved 
10 regions between man and mouse, indicating a functional significance. Differences in those amino acid regions have 
not been allowed to accumulate during evolution between man and mouse. 

Experimental procedures: 

15 a) 5' and 3'RACE 

[0076] To clone the 5' end of the SHOXa and b transcripts, 5'RACE was performed using the constructed 'Marathon 
cDNA libraries'. The following oligonucleotide primers were used: SHOX B rev, GAAAGGCATCCGTAAGGCTCCC 
(position 697-718, reverse strand [r]) and the adaptor primer AP1 . PCR was carried out using touchdown parameters: 

20 94°C for 2 min, 94°C for 30 sec, 70°C for 30 sec, 72°C for 2 min for 5 cycles. 94°C for 30 sec, 66°C for 30 sec, 72°C 
for 2 min for 5 cycles. 94°C for 30 sec, 62°C for 30 sec, 72°C for 2 min for 25 cycles. A second round of amplification 
was performed using 1/100 of the PCR product and the following nested oligonucleotide primers: SHOX A rev, 
G ACG CCTTTATGC ATCTG ATTCTC (position 617-640 r) and the adaptor primer AP2. PCR was carried out for 35 
cycles with an annealing temperature of 60°C. 

25 [0077] To clone the 3' end of the SHOXa and b transcripts, 3'RACE was performed as previously described (Frohman 
et al., 1 988) using o I igo(dT) adaptor primed first strand cDNA. The following oligonucleotide primers were used: SHOX 
A for, G AATC AG ATG C ATAAAG GCGTC (position 619-640) and the oligo(dT)adaptor. PCR was carried out using fol- 
lowing parameters: 94°C for 2 min, 94°C for 30 sec, 62°C for 30 sec, 72°C for 2 min for 35 cycles. A second round of 
amplification was performed using 1/1 00 of the PCR product and the following nested oligonucleotide primers: SHOX 

30 B for, GGGAGCCTTACGGATGCCTTTC (position 697-71 8) and the oligo(cTT)adaptor. PCR was carried out for 35 cy- 
cles with annealing temperature of 62°C. 

[0078] To validate the sequences of SHOXa and SHOXb transcripts, PCR was performed with a 5' oligonucleotide 
primer and a 3' oligonucleotide primer. For SHOXa the following primers were used: G310 for, AGCCCCGGCT- 
GCTCGCCAGC (position 59-78) and SHOX D rev, CTGCGCGGCGGGTCAGAGCCCCAG (position 959-982 r). For 
35 SHOXb the following primers were used: G310 for, AGCCCCGGCTGCTCGCCAGC and SHOX2A rev, GCCTCAG- 
CAGCAAAGCAAGATCCC (position 1215-1238 r). Both PCRs were carried out using touchdown parameters: 94°C 
for 2 min, 94°C for 30 sec, 70°C for 30 sec, 72°C for 2 min for 5 cycles. 94°C for 30 sec, 68°C for 30 sec, 72°C for 2 
min for 5 cycles. 94°C for 30 sec, 65°C for 30 sec, 72°C for 2 min for 35 cycles. Products were gel-purified and cloned 
for sequencing analysis. 

40 

b) SSCP Analysis 

[0079] SSCP analysis was performed on genomic amplified DNA from patients according to a previously described 
method (Orita et al., 1 989). One to five u.l of the PCR products were mixed with 5 u.l of denatu ration solution containing 
45 95% Formamid and 10mM EDTA pH8 and denaturated at 95°C for 10 min. Samples were immediately chilled on ice 
and loaded on a 1 0% Polyacryamidgel (Acrylamide:Bisacryamide = 37.5:1 and 29: 1 ; Multislotgel, TGGE base, Qiagen) 
containing 2% glycerol and 1xTBE. Gels were run at 15°C with 500V for 3 to 5 hours and silver stained as described 
in TGGE handbook (Qiagen, 1993). 

so c ) Cloning and Sequencing of PCR Products 

[0080] PCR products were cloned into pMOSS/ue using the pMOSB/ueT- Vector Kit from Amersham. Overnight 
cultures of single colonies were lysed in 100 uJ H 2 0 by boiling for 10 min. The lysates were used as templates for 
PCRs with specific primers for the cloned PCR product. SSCP of PCR products allowed the identification of clones 
55 containing different alleles. The clones were sequenced with CY5 labelled vector primers Uni and T7 by the cycle 
sequencing method described by the manufacturer (ThermoSequenase Kit (Amersham)) on an ALF express automated 
sequencer (Pharmacia). 
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d) PCR Screening of cDNA Libraries 

[0081] To detect expression of SHOXa and b, a PCR screening of several cDNA libraries and first strand cDNAs 
was carried out with SHOXa and b specific primers. For the cDNA libraries a DNA equivalent of 5x1 0 8 pf u was used. 
5 For SHOXa, primers SHOX E rev, GCTGAGCCTGGACCTGTTGGAAAGG (position 713-737 r) and SHOX a for were 
used. For SHOXb, the following primers were used: SHOX B for and SHOX2A rev. Both PCRs were carried out using 
touchdown parameters: 94°C for 2 min; 94°C for 30 sec, 68°C for 30 sec, 72°C for 40 sec for 5 cycles. 94°C for 30 
sec, 65°C for 30 sec, 72°C for 40 sec for 5 cycles. 94°C for 30 sec, 62°C for 30 sec, 72°C for 40 sec for 35 cycles. 

10 e) PCR Screening of cDNA Libraries 

[0082] To detect expression of SHOXa and b, a PCR screening of several cDNA libraries and first strand cDNAs 
was carried out with SHOXa and b specific primers. For the cDNA libraries a DNA equivalent of 5x1 0 8 pfu was used. 
For SHOXa, primers SHOX E rev, GCTGAGCCTGGACCTGTTGGAAAGG (position 713-737 r) and SHOX a for were 
is used. For SHOXb, the following primers were used: SHOX B for and SHOX2A rev. Both PCRs were carried out using 
touchdown parameters: 94°C for 2 min; 94°C for 30 sec, 68°C for 30 sec, 72°C for 40 sec for 5 cycles. 94°C for 30 
sec, 65°C for 30 sec, 72°C for 40 sec for 5 cycles. 94°C for 30 sec, 62°C for 30 sec, 72°C for 40 sec for 35 cycles. 

Example 5 

20 

Expression pattern of OG12, the putative mouse homolog of both SHOX and SHOT 

[0083] In situ hybridisation on mouse embryos ranging from day 5 p.c. and day 18,5 p.c, as well as on fetal and 
newborn animals was carried out to establish the expression pattern. Expression was seen in the developing limb 
25 buds, in the mesoderm of nasal processes which contribute to the formation of the nose and palate, in the eyelid, in 
the aorta, in the developing female gonads, in the developing spinal cord (restricted to differentiating motor neurons) 
and brain. Based on this expression pattern and on the mapping position of its human homolog SHOT, SHOT represents 
a likely candidate for the Cornelia de Lange syndrome which includes short stature. 

30 Example 6 

[0084] Isolation of a novel SHOX-like homeobox gene on chromosome three, SHOT, being related to human growth 
/ short stature 

[0085] A new gene called SHOT (for SHOX-homolog on chromosome three) was isolated in human, sharing the 
35 most homology with the murine OG1 2 gene and the human SHOX gene. The human SHOT gene and the murine OG12 
genes are highly homologous, with 99 % identity at the protein level. Although not yet proven, due to the striking 
homology between SHOT and SHOX ( identity within the homeodomain only), it is likely that SHOT is also a gene likely 
involved in short stature or human growth. 

[0086] SHOT was isolated using primers from two new human ESTs (HS 1224703 and HS 126759) from the EMBL 
40 database, to amplify a reverse-transcribed RNA from a bone marrow fibroblast line (Rao et al, 1997). The 5' and 3' 
ends of SHOT were generated by RACE-PCR from a bone marrow fibroblast library that was constructed according 
to Rao et al., 1997. SHOT was mapped by FISH analysis to chromosome 3q25/q26 and the murine homolog to the 
syntenic region on mouse chromosome 3. Based on the expression pattern of OG12, its mouse homolog, SHOT rep- 
resents a candidate for the Cornelia Lange syndrome (which shows short stature and other features, including cranio- 
45 facial abnormalities) mapped to this chromosomal interval on 3q25/26. 

Example 7 

Searching for Mutations in Patients with Idiopathic Short Stature 

50 

[0087] The DNA sequences of the present invention are used in PCR, LCR, and other known technologies to deter- 
mine if such individuals with short stature have small deletions or point mutations in the short stature gene. 
[0088] A total of initially 91 (in total 250 individuals) unrelated male and female patients with idiopathic short stature 
(idiopathic short stature has an estimated incidence of 2 - 2,5 % in the general population) were tested for small rear- 
55 rangements or point mutations in the SHOXa gene. Six sets of PCR primers were designed not only to amplify single 
exons but also sequences flanking the exon and a small part of the 5'UTR. For the largest exon, exon one, two additional 
intemal-exon primers were generated. Primers used for PCR are shown in table 2. 

[0089] Single strand conformation polymorphism (SSCP) of all amplified exons ranging from 120 to 295 bp in size 
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was carried out. Band mobility shifts were identified in only 2 individuals with short stature (Y91 and A1). Fragments . 
that gave altered SSCP patterns (unique SSCP conformers) were cloned and sequenced. To avoid PCR and sequenc- 
ing artifacts, sequencing was performed on two strands using two independent PCR reactions. The mutation in patient 
Y91 resides 28bp 5'of the start codon in the 5'UTR and involves a cytidine-to-guanine substitution. To find out if this 
5 mutation represents a rare polymorphism or is responsible for the phenotype by regulating gene expression e.g. though 
a weaker binding of translation initiation factors, his parents and a sister were tested. As both the sister and father with 
normal height also show the same SSCP variant (data not shown), this base substitution represents a rare polymor- 
phism unrelated to the phenotype. 

[0090] Cloning and sequencing of a unique SSCP conformer for patient Al revealed a cytidine-to-thymidine base 
10 transition (nucleotide 674) which introduces a termination codon at amino-acid position 195 of the predicted 225 and 
292 amino-acid sequences, respectively. To determine whether this nonsense mutation is genetically associated with 
the short stature in the family, pedigree analysis was carried out. It was found that all six short individuals (defined as 
height below 2 standard deviations) showed an aberrant SSCP shift and the cytidine-to-thymidine transition. Neither 
the father, nor one aunt and maternal grandfather with normal height showed this mutation, indicating that the grand- 
15 mother has transferred the mutated allele onto two of her daughters and her two grandchildren. Thus, there is con- 
cordance between the presence of the mutant allele and the short stature phenotype in this family. 
[0091] The identical situation as indicated above was found in another short stature patient of Japanese origin. 

Example 8 

20 

[0092] The DNA sequences of the present invention are used to characterize the function of the gene or genes. The 
DNA sequences can be used as search queries for data base searching of nucleic acid or amino acid databases to 
identify related genes or gene products. The partial amino acid sequence of SHOX93 has been used as a search query 
of amino acid databases. The search showed very high homology to many known homeobox proteins. The cDNA 
25 sequences of the present invention can be used to recombinantly produce the peptide. Various expression systems 
known to those skilled in the art can be used for recombinant protein production. 

[0093] By conventional peptide synthesis (protein synthesis according to the Merrifield method), a peptide having 
the sequence CSKSFDQKSKDGNGG was synthesized and polyclonal antibodies were derived in both rabbits and 
chicken according to standard protocols. 

30 
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15 [0140] Preferred emobdiments of the invention are especially the following: 

E1 . An isolated human nucleic acid molecule encoding polypeptides containing a homeobox domain of sixty amino 
acids having the amino acid sequence of SEQ ID NO: 1 and having regulating activity on human growth. 
E2. A nucleic acid molecule according to embodiment E1 which is selected from the following group: 

20 

a) an isolated DNA molecule comprising a nucleotide sequence (i) encoding a polypeptide containing a home- 
obox domain of sixty amino acids having the amino acid sequence of SEQ ID NO: 1 and which has the biological 
activity to regulate human growth, or (ii) encoding a polypeptide containing a homeobox domain of sixty amino 
acids having the amino acid sequence of SEQ ID NO: 1 except that one or more amino acid residues have 

25 been deleted, added or substituted but which retains the same biological activity of regulating human growth; 

b) an isolated DNA molecule comprising the nucleotide sequence of SHOX ET93 [SEQ ID NO: 2] and the 
nucleotide sequence of SHOX ET45 [SEQ ID NO: 4] or fragments thereof; 

c) nucleic acid molecules capable of hybridizing to the DNA molecules of a) or b); and 

d) DNA molecules comprising a nucleotide sequence having a homology of seventy percent or higher with 
30 the DNA molecules of a) or b). 

E3. A DNA molecule according to embodiment E2 which encodes a polypeptide having an N-terminal and/or C- 
terminal amino acid extension to the homeobox domain of sixty amino acids of SEQ ID NO: 1 . 
E4. A DNA molecule according to embodiment E3 which encodes a polypeptide having a length of 150 to 350 
35 amino acids. 

E5. A DNA molecule according to any of embodiments E2 - E4 further comprising the nucleotide sequence of 
SHOX G310[SEQIDNO: 3]. 

E6. A DNA molecule according to any of embodiments 2 - 5 further comprising the nucleotide sequence of SHOX 
G108 [SEQ ID NO: 5]. 

*o E7. A DNA molecule according to any of embodiments E2 - E6 further comprising the nucleotide sequence of 

SHOX Va [SEQ ID NO: 6] or SHOX Vb [SEQ ID NO: 7]. 

E8. A DNA molecule according to any of embodiments E1 - E4 which encodes a polypeptide which is selected 
from the following group: 

45 a) transcription factor A having essentially the amino acid sequence of [SEQ ID NO: 11 ]; 

b) transcription factor B having essentially the amino acid sequence of [SEQ ID NO: 13]; and 

c) transcription factor C having essentially the amino acid sequence of [SEQ ID NO: 16]. 

E9. DNA sequence comprising the nucleotide sequence of SHOX ET93 [SEQ ID No. 2]. 
so E1 0. A DNA sequence according to embodiment E9 further comprising the nucleotide sequence of SHOX G31 0 

[SEQ. ID NO. 3]. 

E11. A DNA sequence according to embodiments E9 or E1 0 further comprising the nucleotide sequence of SHOX 
ET45 [SEQ ID NO. 4]. 

E12. A DNA sequence according to any of embodiments E9 - E12 further comprising the nucleotide sequence of 
55 SHOX G1 08 [SEQ ID 5]. 

E1 3. A DNA sequence accordingto any of embodiments E9 - E1 2 further comprising eitherthe nucleotide sequence 
of SHOX Va [SEQ ID 6] or SHOX Vb [SEQ ID 7]. 

E1 4. A DNA sequence according to embodiment E9 comprising the nucleotide sequence of SHOX ET93 [SEQ ID 
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No. 2] and the nucleotide sequence of SHOX ET45 [SEQ. ID. No. 4J. 

E15. A DNA sequence according to embodiment E9 comprising the nucleotide sequence of SHOX ET93 [SEQ ID 
NO 2], the nucleotide sequence of SHOX ET45 [SEQ. ID. No. 4] and the nucleotide sequence of SHOX G108 
[SEQ ID 5]. 

5 E1 6. A DNA sequence according to any of embodiments E9 - E1 5 comprising the nucleotide sequences of SHOX 

G310 [SEQ ID NO. 3], SHOX ET93 [SEQ ID NO 2], SHOX ET45 [SEQ ID No. 4] and SHOX G108[SEQ ID 5]. 
E1 7. A DNA sequence according to embodiment 1 7 further comprising the nucleotide sequence of SHOX Va [S EQ 
ID No 6]. 

E18. A DNA sequence according to embodiment E16 further comprising the nucleotide sequence of SHOX Vb 
w [SEQ ID No. 7]. 

E19. A DNA sequence according to embodiment E9 consisting essentially of the isolated genomic sequence of 
the PARI region identified in [SEQ ID No. 14]. 

E20. A DNA sequence comprising the nucleotide sequence of SHOX ET92 [SEQ. ID No. 9]. 
E21 . A DNA sequence according to any of embodiments E9 - E20 whereby the DNA is a genomic or isolated DNA 
15 responsible for regulating human growth. 

E22. A DNA sequence according to any of embodiments E9 - E21 whereby the DNA is a cDNA. 

E23. A cDNA according to embodiment E22 consisting essentially of the nucleotide sequence of SHOXa [SEQ ID 

No. 10] or SHOXb [SEQ ID NO. 12]. 

E24. A cDNA according to embodiment E22 consisting essentially of the nucleotide sequence of SHOT [SEQ ID 
20 No. 14]. 

E25. A human growth protein (transcription factor SHOXa) having the amino acid sequence given in [SEQ ID No. 
11] or a functional fragment thereof. 

E26. A human growth protein (transcription factor SHOXb) having the amino acid sequence given in [SEQ ID No. 
1 3] or a functional fragment thereof. 
25 £27. A human growth protein (transcription factor SHOT) having the amino acid sequence given in [SEQ ID NO: 

1 6] or a functional fragment thereof. 

E28. A cDNA encoding for a protein according to embodiment E25, E26 or E27. 
E29. A pharmaceutical composition comprising a protein according to any of embodiments E25 to E27. 
E30. A method for the treatment of short stature comprising administering to a subject in need thereof a therapeu- 
30 tically effictive amount of a protein according to embodiment E25 to E27. 

E31 . Use of a protein according to embodiment E25 to E27 for the preparation of a pharmaceutical composition 
for the treatment of short stature. 

E32. Use of a DNA sequence according to embodiments E1 - E24 for the preparation of a pharmaceutical com- 
position for the treatment of disorders relating to mutations of the short stature gene. 
35 E33. Use of a DNA sequence according to any of embodiments E1 - E24 for the preparation of a kit for the iden- 

tification of individuals having a genetic defect responsible for deminished human growth. 
E34. Use of a DNA sequence according to embodiment E33 for the identification of a gene responsible for short 
human stature. 

E35. Method for the determination of short stature on the basis of RNA or DNA molecules, wherein the biological 
*o sample molecule to be examined is amplified in the presence of two nucleotide probes completely or in part com- 

plementary to any of the DNA sequences mentioned in SEQ ID No. 2 to SEQ ID No. 7 and subsequently determined 
by a suitable detection system. 

E36. Use of the method according to embodiment E35 for the identification of persons having a genetic defect 
responsible for short stature. 

45 E37. Transgenic animal transformed with a gene responsible for short stature containing a DNA sequence accord- 

ing to any one of embodiments E1 - E24. 

E38. Cells transformed with a DNA sequence according to any one of embodiments E1 - E24. 
E39. Test system for identifying or screening pharmaceutical agents useful for the treatment of human short stature 
comprising a cell according to embodiment E38. 
50 E40. Method for identifying or screening of candidates for pharmaceutical agents useful for the treatment of dis- 

orders relating to mutations in the short stature gene comprising providing a test system according to embodiment 
E39 and determining variations in the phenotype of said cells or variations in the expression products of said cells 
after contacting said cells with said candidate pharmaceutical agents. 

E41 . An expression vector comprising a DNA molecule according to embodiments E1 - E8 which is capable of 
55 effecting the expression of the encoded polypeptide. 

E42. A method for the in vivo treatment of human growth disorders related to at least one mutation in the SHOX 
or SHOT gene by gene therapy, comprising introducing into human cells an expression plasmid in which a DNA 
molecule according to any of embodiments E1 - E8 is incorporated downstream from the expression promotor that 



18 



EP 1 260 228 A2 



effects expression in a human host cell. 

E43. A method according to embodiment E42 for the treatment of Turner syndrome or short stature. 
E44. Antibodies obtained by immunization of mammals using the transcription factors A, B or C or antigenic frag- 
ments thereof and isolating such antibodies from such mammals. 

5 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICANT: 

(A) NAME: Rappold-Hoerbrand, Gudrun, Dr. 

(B) STREET: Hausackerweg 14 

(C) CITY: Heidelberg 

(E) COUNTRY: Germany 

(F) POSTAL CODE (ZIP) : 69118 

(A) NAME: Rao, Ercole 

(B) STREET: Odenwaldstrasse 11 

(C) CITY: Riedstadt-Erfelden 

(E) COUNTRY: Germany 

(F) POSTAL CODE (ZIP): 64560 

(ii) TITLE OF INVENTION: HUMAN GROWTH GENE AND SHORT STATURE GENE 
REGION 

(iii) NUMBER OF SEQUENCES: 16 

(iv) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.30 (EPO) 

(vi) PRIOR APPLICATION DATA: 

<A) APPLICATION NUMBER: US 60/027,633 
(B) FILING DATE: 01-OCT-1996 

(vi) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: EP 97100583.0 

(B) FILING DATE: 16-JAN-1997 



(2) INFORMATION FOR SEQ ID NO: Is 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 60 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: 

Gin Arg Arg Ser Arg Thr Asn Phe Thr Leu Glu Gin Leu Asn Glu Leu 
15 10 15 

Glu Arg Leu Phe Asp Glu Thr His Tyr Pro Asp Ala Phe Met Arg Glu 
20 25 30 

Glu Leu Ser Gin Arg Leu Gly Leu Ser Glu Ala Arg Val Gin Val Trp 
35 40 45 

Phe Gin Asn Arg Arg Ala Lys Cys Arg Lys Gin Glu 
50 55 60 

(2) INFORMATION FOR SEQ ID NO: 2: 



20 



EP 1 260 228 A2 



10 



15 



20 



25 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 209 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDBDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "exon II: ET93" 

(v) FRAGMENT TYPE: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

GGATTTATGA ATGCAAAGAG AAGCGCGAGG ACGTGAAGTC GGAGGACGAG GACGGGCAGA 60 

CCAAGCTGAA ACAGAGGCGC AGCCGCACCA ACTTCACGCT GGAGCAGCTG AACGAGCTCG 120 

AGCGACTCTT CGACGAGACC CATTACCCCG ACGCCTTCAT GCGCGAGGAG CTCAGCCAGC 180 

GCCTGGGGCT CTCCGAGGCG CGCGTGCAG 209 
(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 368 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "exon I: G310" 

30 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 
GTGATCCACC CGCGCGCACG GGCCGTCCTC TCCGCGCGGG GAGACGCGCG CATCCACCAG 60 

35 CCCCGGCTGC TCGCCAGCCC CGGCCCCAGC CATGGAAGAG CTCACGGCTT TTGTATCCAA 120 

GTCTTTTGAC CAGAAAAGCA AGGACGGTAA CGGCGGAGGC GGAGGCGGCG GAGGTAAGAA 180 
GGATTCCATT ACGTACCGGG AAGTTTTGGA GAGCGGACTG GCGCGCTCCC GGGAGCTGGG 240 

40 GACGTCGGAT TCCAGCCTCC AGGACATCAC GGAGGGCGGC GGCCACTGCC CGGTGCATTT 300 

GTTCAAGGAC CACGTAGACA ATGACAAGGA GAAACTGAAA GAATTCGGCA CCGCGAGAGT 360 
GGCAGAAG 368 

45 (2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 58 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
50 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "exon III: ET45" 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

GTTTGGTTCC AGAACCGGAG AGCCAAGTGC CGCAAACAAG AGAATCAGAT GCATAAAG 58 

5 (2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 89 base pairs 
{B) TYPE: nucleic acid 
(C) STRANDEDNESS: single 
10 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "exon IV: G108" 



15 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
GCGTCATCTT GGGCACAGCC AACCACCTAG ACGCCTGCCG AGTGGCACCC TACGTCAACA 
TGGGAGC CTT ACGGATGCCT TTCCAACAG 

20 

(2) INFORMATION FOR SEQ ID NO: 6: 

( i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 1166 base pairs 

(B) TYPE: nucleic acid 
25 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "exon : Va" 

30 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 



45 



50 



55 



GTCCAGGCTC 


AGCTGCAGCT 


GGAAGGCGTG 


GCCCACGCGC 


ACCCGCACCT 


GCACCCGCAC 


60 


CTGGCGGCGC 


ACGCGCCCTA 


CCTGATGTTC 


CCCCCGCCGC 


CCTTCGGGCT 


GCCCATCGCG 


120 


TCGCTGGCCG 


AGTCCGCCTC 


GGCCGCCGCC 


GTGGTCGCCG 


CCGCCGCCAA AAGCAACAGC 


180 


AAGAATTCCA 


GCATCGCCGA 


CCTGCGGCTC 


AAGGCGCGGA 


AGCACGCGGA GGCCCTGGGG 


240 


CTCTGACCCG 


CCGCGCAGCC 


CCCCGCGCGC 


CCGGACTCCC 


GGGCTCCGCG 


CACCCCGCCT 


300 


GCACCGCGCG 


TCCTGCACTC 


AACCCCGCCT 


GGAGCTCCTT 


CCGCGGCCAC 


CGTGCTCCGG 


360 


GCACCCCGGG 


AGCTCCTGCA 


AGAGGCCTGA 


GGAGGGAGGC 


TCCCGGGACC 


GTCCACGCAC 


420 


GACCCAGCCA 


GACCCTCGCG 


GAGATGGTGC 


AGAAGGCGGA 


GCGGGTGAGC 


GGCCGTGCGT 


480 


CCAGCCCGGG 


CCTCTCCAAG 


GCTGCCCGTG 


CGTCCTGGGA 


CCCTGGAGAA GGGTAAACCC 


540 


CCGCCTGGCT 


GCGTCTTCCT 


CTGCTATACC 


CTATGCATGC 


GGTTAACTAC 


ACACGTTTGG 


600 


AAGATCCTTA 


GAGTCTATTG 


AAACTGCAAA 


GATCCCGGAG 


CTGGTCTCCG 


ATGAAAATGC 


660 


CATTTCTTCG 


TTGCCAACGA 


TTTTCTTTAC 


TACCATGCTC 


CTTCCTTCAT 


CCCGAGAGGC 


720 


TGCGGAACGG 


GTGTGGATTT 


GAATGTGGAC 


TTCGGAATCC 


CAGGAGGCAG 


GGGCCGGGCT 


780 


CTCCTCCACC 


GCTCCCCCGG 


AGCCTCCCAG 


GCAGCAATAA 


GGAAATAGTT 


CTCTGGCTGA 


840 



60 
89 
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10 



15 



20 



35 



40 



GGCTGAGGAC GTGAACCGCG GGCTTTGGAA AGGGAGGGGA GGGAGACCCG AACCTCCCAC 900 

GTTGGGACTC CCACGTTCCG GGGACCTGAA TGAGGACCGA CTTTATAACT TTTCCAGTGT 960 

TTGATTCCCA AATTGGGTCT GGTTTTGTTT TGGATTGGTA TTTTTTTTTT TTTTTTTTTT 1020 

TGCTGTGTTA CAGGATTCAG ACGCAAAAGA CTTGCATAAG AGACGGACGC GTGGTTGCAA 1080 

GGTGTCATAC TGATATGCAG CATTAACTTT ACTGACATGG AGTGAAGTGC AATATTATAA 1140 

ATATTATAGA TTAAAAAAAA AATAGC 1166 
(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 625 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A> DESCRIPTION: /desc = "exon Vb" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

25 ATGGAGTTTT GCTCTTGTCG CCCAGGCTGG AGTATAATGG CATGATCTCG ACTCACTGCA 60 

ACCTCCGCCT CCCGAGTTCA AGCGATTCTC CTGCCTCAGC CTCCCGAGTA GCTGGGATTA 120 

CAGGTGCCCA CCACCATGTC AAGATAATGT TTGTATTTTC AGTAGAGATG GGGTTTGACC 180 

30 ATGTTGGCCA GGCTGGTCTC GAACTCCTGA CCTCAGGTGA TCCACCCGCC TTAGCCTCCC 240 

i 

AAAGTGCTGG GATGACAGGC GTGAGCCCCT GCGCCCGGCC TTTGTAACTT TATTTTTAAT 300 

TTTTTTTTTT TTTTAAGAAA GACAGAGTCT TGCTCTGTCA CCCAGGCTGG AGCACACTGG 360 

TGCGATCATA GCTCACTGCA GCCTCAAACT CCTGGGCTCA AGCAATCCTC CCACCTCAGC 420 

CTCCTGAGTA GCTGGGACTA CAGGCACCCA CCACCACACC CAGCTAATTT TTTTGATTTT 480 

TACTAGAGAC GGGATCTTGC TTTGCTGCTG AGGCTGGTCT TGAGCTCCTG AGCTCCAAAG 540 

ATCCTCTCAC CTCCACCTCC CAAAGTGTTA GAATTACAAG CATGAACCAC TGCCCGTGGT 600 

CTCCAAAAAA AGGACTGTTA CGTGG 625 

(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS : 
45 (A) LENGTH: 15577 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
50 (A) DESCRIPTION: /desc = "HOX93" 

(ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 1498. .1807 

55 (D) OTHER INFORMATION: / function= "part of exon I (G310) ■ 
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(ix) FEATURE: 

<A) NAME/KEY: misc_feature 
(B) LOCATION: 3844. .4068 

(D) OTHER INFORMATION: /functions "pET92 region (first 
part) ■ 

(ix) FEATURE: 

(A) NAME/KEY: misc_f eature 

(B) LOCATION: 4326. .4437 

(D) OTHER INFORMATION: /function= "pET92 region (second 
part)' 

(ix) FEATURE: 

(A) NAME/KEY: miscJE eature 

(B) LOCATION: 4545. .4619 

<D) OTHER INFORMATION: /functions -pET92 region (third 
part)" 

(ix) FEATURE: 

(A) NAME/ KEY: exon 

(B) LOCATION: 5305.. 5512 

(D) OTHER INFORMATION: /functions -part of exon II (ET93) 

( ix) FEATURE : 

(A) NAME/ KEY: exon 

(B) LOCATION: 11620 . .11729 

(D) OTHER INFORMATION: /functions "part of exon IV (G108) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 



CTCTCCCTGT 


TGTGTCTCTC 


TTTCTCTCTC 


TCCATCTCTC 


TCCGTCTTTC 


CCCCTCTGTC 


TCTTTCTCTG 


TCTCCATCCC 


TCTGTCTCTC 


CCTTTCTCTC 


TGTCTTTCCT 


TGTCTCTCTC 


TTTCTCTCTC 


TCTCTCCATC 


TCTCTCTCTC 


CCGGTCTCTC 


TCTCTCCATC 


TCCCCGTCTC 


TCCGTTTCTC 


TCTCTGCCTC 


TCCCTGTCTG 


TCTCTCTCTT 


TGTGTGTGTT 


ACACACACCC 


CAACCCACCG 


TCACTCATGT 


CCCCCCACTG 


CTGTGCCATC 


TCACACAAGT 


TCACAGCTCA 


GCTGTCATCC 


TGGGTCCCCA 


GGCCCCGCCG 


GGGAGGAAGA 


TGCGCCGTGG 


GGTTACGGGA 


GGAAGGGGAC 


TCCGGGCCTC 


CTGGTGCCCC 


ACTTTATTTG 


CAGAAGGTCC 


TTGGCAGGAA 


CCGTGACGCG 


TTTGGTTTCC 


AGGACTTGGA 


AAACGAATTT 


CAGGTCGCGA 


TGGCGAGCAC 


CGGCTTCCCC 


TGAAGCACAT 


TCAATAGCGA 


GAGGCGGGAG 


GGAGCGAGCA 


GGAGCATCCC 


ACCATGAAAA 


CCAAAAACAC 


AAGTATTTTT 


TTCACCCGGT 


AAATACCCCA 


GACGCCAGGG 


TGACAGCGCG 


GCGCTAAGGG 


AGGAGGCCTC 


GCGCCGGGGT 


CCGCCGGGAT 


CTGGCGCGGG 


CGGAAAGAAT 


ATAGATCTTT 


ACGAACCGGA 


TCTCCCGGGG 


ACCTGGGCTT 


CTTTCTGCGG 


GCGCTGGAAA 


CCCGGGAGGC 


GGCCCCGGGG 


ATCCTCGGCC 


TCCGCCGCCG 


CCGCCTCCCA 


AGCGCCCGCG 


TCCCGGTTTG 


GGGACACCCG 


GCCCCTTCTT 


CTCACTTTCG 


GGGATTCTCC 


AGCCGCGTTC 


CATCTCACCA 


ACTCTCCATC 


CAAGGGCGCG 


CCGCCACCAA 


CTTGGAGCTC 


ATCTTCTCCC 


AAAATCGTGC 


GTCCCCGGGG 


CGCCCGGGTC 


CCCCCCCTCG 


CCATCTCAAC 


CCCGGCGCGA 


CCCGGGCGCT 


TCCTGGAAAG 


ATCCAGGCGC 


CGGGCTCTGC 


GCTCCTCCCG 


GGAGCGAGGG 


CGGCCGGACA 


ACTGGGACCC 


TCCTCTCTCC 


AGCCGTGAAC 


TCCTTGTCTC 
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TCTGTCTCTC TCTGCAGGAA AACTGGAGTT TGCTTTTCCT CCGGCCACGG AAAGAACGCG 1140 

GGTAACCTGT GTGGGGGGCT CGGGCGCCTG CGCCCCCCTC CTGCGCGCGC GCTCTCCCTT 1200 

5 CCAAAAATGG GATCTTTCCC CCTTCGCACC AAGGTGTACG GACGCCAAAC AGTGATGAAA 1260 

TGAGAAGAAA GCCAATTGCC GGCCTGGGGG GTGGGGGAGA CACAGCGTCT CTGCGTGCGT 1320 

CCGCCGCGGA GCCCGGAGAC CAGTAATTGC ACCAGACAGG CAGCGCATGG GGGGCTGGGC 1380 

10 GAGGTCGCCG CGTATAAATA GTGAGATTTC CAATGGAAAG GCGTAAATAA CAGCGCTGGT 1440 

GATCCACCCG CGCGCACGGG CCGTCCTCTC CGCGCGGGGA GACGCGCGCA TCCACCAGCC 1500 

CCGGCTGCTC GCCAGCCCCG GCCCCAGCCA TGGAAGAGCT CACGGCTTTT GTATCCAAGT 1560 

15 CTTTTGACCA GAAAAGCAAG GACGGTAACG GCGGAGGCGG AGGCGGCGGA GGTAAGAAGG 1620 

ATTCCATTAC GTACCGGGAA GTTTTGGAGA GCGGACTGGC GCGCTCCCGG GAGCTGGGGA 1680 

CGTCGGATTC CAGCCTCCAG GACATCACGG AGGGCGGCGG CCACTGCCCG GTGCATTTGT 1740 

20 TCAAGGACCA CGTAGACAAT GACAAGGAGA AACTGAAAGA ATTCGGCACC GCGAGAGTGG 1800 

CAGAAGGTAA GTTCCTTTGC GCGCCGGCTC CAGGGGGGCC CTCCTGGGGT TCGGCGCCTC 1860 

CTCGCCACGG AGTCGGCCCC GCGCGCCCCT CGCTGTGCAC ATTTGCAGCT CCCGTCTCGC 1920 

CAGGGTAAGG CCCGGGCCGT CAGGCTTTGC CTAAGAAAGG AAGGAAGGCA GGAGTGGACC 1980 

CGACCGGAGA CGCGGGTGGT GGGTAGCGGG GTGCGGGGGG ACCCAGGGAG GGTCGCAGCG 2040 

GGGGCCGCGC GCGTGGGCAC CGACACGGGA AGGTCCCGGG CTGGGGTGGA TCCGGGTGGC 2100 

TGTGCCTGAA GCCGTAGGGC CTGAGATGTC TTTTTCATTT TCTTTTTCTT TCCTTTCCTT 2160 

TTTTTGTTTG TTTGTTTGTT TGTTTGAGAC AGAGTCTCGC TCTGTCCCCC AGGCTGGAGT 2220 

GCAGTGGTGC GATCTCGGCT CACTGCAACC TCCGCCTCCT GGGTTCAAGC GATTCTCCTG 2280 

CCTCAGCCTC CCCAGTAGCT GGGATTACAG GCATGCACCA CCACGCCTGG CTAATTTTTG 2340 

TGCTTTTAGT AAAGACGGGG ATTCACCATG TTGGCCAGGC TGGTCTCGAA CTCCTGACCT 2400 

CAGGTGATCC ACCCGCCTCG GCCTCCCAAA GTGCTGGGAT GACAGGCGTG AGGCACCGCG 2460 

CCCGGCCTGG GTCCTGACGG CTTAGGATGT GTGTTTCTGT CTCTGCCTGT CTGCCTTGTA 2520 

TTTACGGTCA CCCAGACGCA CAGAGGAGCC GTCTCCACGC GCCTTCCCAG CGCTCAGCGC 2580 

CTGCCGGGCC CCCGGAGATC ACGGGAAGAC TCGAGGCTGC GTGGTAGGAG ACGGGAAGGC 2640 

CCCGGGTCAG CTCGGTTCTG TTTCNCTTTA AGGAACCCTT CATTATTATT TCATTGTTTT 2700 

CCTTTGAACG TCGAGGCTTG ATCTTGGCGA AAGCTGTTGG GTCCATAAAA ACCACTCCCG 2760 

TGAGCGGAGG TGGCCGGGAT CTGGATGGGG CGCGAGGGGC CCCGGGGAAG CTGGCGGCTT 2820 

CGCGGGCGCG TCCTAAGTCA AGGTTGTCAG AGCGCAGCCG GTTGTGCGCG GCCCGGGGGN 2880 

AGCTCCCCTC TGGCCCTTCC TCCTGAGACC TCAGTGGTGG GTCGTCCCGT GGTGGAAATC 2940 

GGGGAGTAAG AGGCTCAGAG AGAGGGGCTG GCCCCGGGGA TCTCTGTGCA CACACGACAA 3000 

CTGGGCGGCA TACATCTTAA GAATAAAATG GGCTGGCTGT GTCGGGGCAC AGCTGGAGAC 3060 

55 



25 



30 
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40 
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GGCTATGGAC 
GCTGATTCGC 

5 CTGGGGCTAC 
CCCTTGTATT 
AAGCTGTGGC 

10 CCAGCCTGCC 
CTGTAATTCA 
CCTGCGTGTA 

15 TTTGGTTGTC 
TGGCCNGTNG 
AAAATAAAGA 

20 TACATTTTNC 
ACTATTTATT 
TACCGTGGAA 
GGGTTCTGCA 

25 

AATTNCCTGG 
AACGATTCAC 
GAAACAGAAT 

30 

GGGTGGGAGG 
TTTACCCCGT 
GCGGCCTTGC 

35 

TTACCCTTGA 
GCAGGCCCNG 
GGGNTTGAAG 

40 

CAGAGAGGNC 
GGCTGAGTCT 
TGGAGATGAG 

45 

GAAAAANGTG 
GCNGCATTCN 
AGGGGCTCCT 

b ° GGGGATAGCG 
CTTCCCGGGC 
TAAAAATAGG 

55 GTTATGTCAA 



GCCTGTTATG TTTTCATTAC 
AAAGTTGAGG TGCGAGGGTG 
CTGCTCTCCG GGGCCCTGCA 
CGTAGGAGGC ACCAGGCAGC 
TACGGTTTAC AAAGCAGTCC 
TTTGACAGTG AGAGGAGTTC 
TACACAGAGA GTTGGCCTTC 
ATTTAAGAGG GTTCGCANGC 
CTTCNGCAAA CACCGTTTTG 
GACCCGGGNA NGAGCAAAGT 
CCAGGCCAAA NNGACCCAGG 
CCCTTGGCTG GGTGCAGAAA 
CATCAGATCC AAGTTAAATC 
GCCTGGAGTT TTTGGGNAAC 
AAGCCTGCGG GTGTTTGAGG 
GGTTCAGACT TAGAGAAATG 
TTGGGGGGAA GGAATGGAGT 
GTGAAATCCA CGTTGGAGTA 
GCCCTGGTGT GGATCGTGGA 
TNCCCTGTAG ACACCCTGGA 
CTGTGCCCCT TTGAGACTGT 
CAGCAAGGGA CATCTCATTA 
AGATAGGAAN CANGAGGGGC 
GGGANGCGAN GGGANGACNA 
TGAATGTCTA AAATGAGGAA 
TCTGCCCNTT CTGACNTCCC 
AAAGTGCCAT TTTTGGCACA 
GGAANNGGAG AGATTTCTGN 
AGGNTACTCA GACGCGGTTC 
AGCTCCGCCA GATCGCGGAG 
TCTCTCCGTA GGCCTAGAAT 
GTCCCGCCGG GGATCCCACA 
ATTTGACACC CCACTCTCCT 
CAGAGGTGAA GTGGATAATT 



AAAGACGCAG AGAATCTAGC 
AATGCCCCAA AGGTAATTCT 
TTTGGGGTGT GGAGTGGCCC 
TTCCCAAGGC CCTGACTTTG 
CCGGTTTCTG ACCGTCTAAG 
CTCCCTACAC ACTGCTGCGG 
CTGGACGCAA GGCTGGGAGC 
GCCCGGCGGC CGCTTCTGNT 
CTCCTCTNGN AACTCTCTCT 
GTCCTCCAGA CCNTTTTGAA 
GCCACAGGAG AGGAGACAGA 
GACCCCCGGG CCAGGACTGC 
GAGGTTGGAG GGCAGGGGAG 
AGCGTGTCCC CGCCGAGCCT 
ACTTTGAAGA CCAGTTTGTC 
AAGGAGGGAG AGCTGGGGTC 
GTTCTTGCAG GCACATGTCT 
AGCGTCCAGC GCTGAATGTA 
AGGNAAGAAA GACAGAACAG 
TTTGTCAGCT TTGCAAGCTT 
TTCCAGACTA AACTTCCAAA 
GGG CATCGCG TGCTTCTCAT 
NGTTGGNAGA TGCNCACTTC 
CCTTTTANCT TAAACCCCTN 
GAAAAGGTTT TTCACCTGGA 
CCAGCAAATA CAGACAGGTC 
CTCTGGTGGG GTAGGTGCCC 
CGCACGCGGT TCAGCCCCCA 
TGCTGTTCTG CTGAGAAACA 
GGACCCCCAG CCCTCCTGCG 
CTGCAACCCG CCCCGGGTCC 
GTTGGCAGCT CTTCCTCAAA 
TAAAAAAAAA AAATAAGAAA 
GAGGAAACGA TTCTGAGATG 



CTCGGC TTTT 


3120 


TCrTAAGACT 


3180 


CGfifiAAATAG 


3240 


TCH A AGC AGA 


3300 


AGfSPAOGAGfJ 


3360 




3420 


CGCPTOAGGG 


3480 


GGGGTTGPTT 

\J\AJ\J A A wV * A 


3540 




3600 


ANGTGAGAGG 


3660 


GAGTCCCCGT 


3720 


CACCCAGGCT 


3780 


AGTCTGAGGT 


3840 


GGGAGCCCGT 


3900 


AGTTGGGCTT. 


3960 


rtptt 1 c Ann A 


4020 


RTTAHGAGGT 
ul l t\\j\xr\\j\3 x 


4080 


VJV. X\»VJ\7U\3 X VJ 


4140 


V?VJ 1\J\< X X A 


4200 


x x wvj x tu^ n 


4260 


TGTCAGCCCC 

Xsj XA»A\9V«\mVv\w 


4320 




4380 


CACC AfiTPPT 


444Q 
•« *a u 


fiAfiPTTY^fiTN 

V7/lO\» X 1 VjVj X 11 




AAfYSPfTPAfS 


4JOU 


appaanptap 


4690 

** Di U 


vt*VV—»w\jv*vj X\a X 




fin p acciczKFra 


4740 


WJV. X iVuou X 


youu 


CTGCAGCGGT 


4860 


TCCCCGTGTC 


4920 


TTCTTTCCCT 


4980 


AAAAGGTTAG 


5040 


AGGCCAAGAA 


5100 



26 



EP 1 260 228 A2 



AACAACGCTC GTGCAAAGCC CAGGTTTTTG GGAAAGCAGC GAGTATCCTC CTCGGCTTTT 5160 

GCGTTATGGA CCCCACGCAG TTTTTGCGTC AAAGCGCATT GGTTTTCGAG GGCCCCCTTT 5220 

5 CCACCGCGGG ATGCACGAAG GGGTTCGCCA CGTTGCGCAA AACCTCCCCG GCCTCAGCCC 5280 

TGTGCCCTCC GCTCCCCACG CAGGGATTTA TGAATGCAAA GAGAAGCGCG AGGACGTGAA 5340 

GTCGGAGGAC GAGGACGGGC AGACCAAGCT GAAACAGAGG CGCAGCCGCA CCAACTTCAC 5400 

10 GCTGGAGCAG CTGAACGAGC TCGAGCGACT TTTTGACGAG ACCCATTACC CCGACGCCTT 5460 

CATGCGCGAG GAGCTCAGCC AGCGCCTGGG GCTTTCCGAG GCGCGCGTGC AGGTAGGAAC 5520 

CCGGGGGCGG GGGCGGGGGG CCCGGAGCCA TCGCCTGGTC CTCGGGAGCG CACAGCACGC 5580 

15 GTACAGCCAC CTGCGCCCGG GCCGCCGCCG TCCCCTTCCC GGAGCGCGGG GAGGTTGGGT 5640 

GAGGGACGGG CTGGGGTTCC TGGACTTTTG GAGACGCCTG AGGCCTGTAG GATGGGTTCA 5700 

TTGCGTTTGT TTTTCACCAA CAGCAAACAA ATATATATAC ATATATATTA TACAAATAAC 5760 

20 AAATAAATAT ATATGTTATA CAGATGGGTA TATTGTATAT ATTATAGATA TTTGTTCGTC 5820 

CTTGGTGCAA AGACACCCGG TGAACCCATA TATTGGCTCC TGACTGCCTT CGGTTCCCCT 5880 

GGGATTGGTT ATAGGGGCAA CACATGCAAA CAAAACTTTC CCTGGATTAT ACTTAGGAGA 5940 

25 CGAAGCTACA GATGCGTTTG ATCCAGAGTG TTTTACAAGA TTTTTCATTT AAAAAAAAAT 6000 

GTGTCTTTTG GCCCCTGATT CCCCTCCGTC TTCCCGTGTG GCTGCATTGA AAAGGTTTCC 6060 

TTAGGATGAA AGGAGAGGGG TGTCCTCTGT CCCTAGGTGG AGAGAAACAG GGTCTTCTCT 6120 

30 TTCCTCCGTT TTTTCACCTA CCGTTTCTAT CTCCCTCCTC CCCTCTCCAG CCCTGTCCTC 6180 

TGCTACAAAC CACCCCCTCC TCCCTCCGGC TGTGGGGAGC GCAGGAGCAC GTTGGGCATC 6240 

TGGATGAGCG GNAGACTATT AGCGGGGCAC GGGGGCTCCC CGAGGAGCGC GCGAATTCAC 6300 

GCTGCCCCAT GAGACCAGGC ACCGGGGGGC GGAGGGGCCT TGGGTGTCCG CAGAGGGACG 6360 

GGCGGGCAGA GCCTTCCTCC GCATTCTAAA CATTCACTTA AAGGTATGAG TTTANTTTCA 6420 

GGGGTGCTGC TGGGAGAGCC TCCAAATGGC TTCTTCCAGC CCCTGCCTGA CAGTTCAGCT 6480 

CCCCTGGAAG GTCAACTCCT CTAGTCCTTT CTCCTGGTTC TGGGCAGGAC AGAAGTGGGG 6540 

GGAGGGAGAG AGAGAGAGAG AGAGAGAGAG ACGGTCAGGA TCCCCGGACC CTGGGGAACC 6600 

CGTCAAAAAT AAATGAAATT AAGATTGCCG ACCAGAGAGA GAACCGTGAC AAAGCAAACG 6660 

GCGTTCAAAG CAAAGAGACG AACTGAAAGC CCGTTCCCGT AGGACTGGTT ATGAGGTCAA 6720 

CACATTCAAA CACAGCTTGC TCTGGATTTT GCTGAGCAGA GGAAGATACA GATGCATTTG 6780 

ATCCAAAGTG TGTTACATCT TTCATTATAT GTGTGTCTAT ATATATAAAC ATATATAAAT 6840 

ATATAAACAT ACATAAATGT ATGTAAATAT ATATAATCTA TATACATATA TAAATATATA 6900 

AACACATATA TAATATATAA ATCTATAAAC ATATATAATA TATAAACATA AATATATAAA 6960 

CATATATAAT ATATAAATAT ATTAACATAT ATAAAATATG TATAAATATA TATAAACATA 7020 

TAAACATATA TAAATATATA AACATATAAA TATATAAACA TATATAAATA TATACAAACA 7080 

55 
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TATTGTATAT 
ATATATACAT 

5 ATATATATAC 
TATATATACA 
TTTTTGGCCC 

10 ACACAACTTT 
GGTTTTACAA 
GATTCTCTTC 

15 TTAAATCAAC 
CAGGAGGGAG 
GTGGCCTCAG 

2Q TGGCAGAAGA 
AAGGGGACAT 
TATATCCCAN 
ATTTAANTTC 

25 

CGNTTAGTTN 
CCNNTAAAAT 
GGTCGNCATG 

30 

AATAAAAGAC 
TGCGCTTTGA 
ACGGGGACGA 

35 

TTGGGTCACA 
TACTGTGGAA 
TTTTATTTTT 

40 

AGTTATGAGA 
TATGGCGCAG 
GGAAATCGCA 

45 

NNGAAATAAC 
GGAANNTGGA 
NGGCNTAATT 

50 

GTGTTNNGAA 
AAATGCCCTT 
GGCGGANGAG 
55 GAGGTGGAGG 



ATATAAATAT ATATAAAAAC 
ATAAAGAAAT ATATATAAAC 
ATAAAATATA TATAAACATA 
TATAAAAATA TATATATTAA 
CTGATTCCCT TCGGTTCCTG 
TCCATCGATG TTGCTTAGGA 
GCTCTTTCAT TTAAATATAT 
CGTCTTCCCA TGTGGCTGCA 
CCTCCCCAGG CATCTTTACC 
AGAGGCTTTG GTGACTTGGA 
TGAGGGAAGG AAGCTGCATC 
TGGATTGGGC TGCCCCGNTA 
CGAGTTTATG TGTCATCTCC 
NGCCCTTGAT GNNNTACTGT 
CNNNACACTA TTTNCTTTCC 
CAGCTNGCGG AAAATTGGTT 
NAAAGACAAA NTCNGGGGAC 
AAAANTTTAA CGACGGTAAA 
ATAATTCTCC NNATCGCCGC 
GGGGTCATAA AAATCAATTA 
GCAGGGACAG AAAAAGAAAC 
TGCGTCTCAG TACAGCCCGT 
ATTTGCTGTA AATAAATTGA 
AGCGTGGCCC TGCAAAGTCG 
CCACGGTGAG GGGCAGGCGG 
AGCTAAATTA AATGTCATTA 
NTTACGGNCA TTTGGGNNAA 
TGTCTTAAGC AGTGTCACAC 
TCGTNNGAAT CACTCCNAAG 
CATCNACTTN NGTATTCTTC 
GGGAGAGTGA ATGAGGCTTT 
CCAGAGGCTT TTGGGTGGCT 
TCCAGGTGGG CATGGAGAGG 
TCTGAAGGCG CCAGCTTTGG 



ATATATATAC ATATAAAAAT 
ATATATACAT ATAAATATAC 
TATACATATA AAAATATATA 
CATATATATA CATATAAAAA 
TGGGATGGGT GATTGAGTCA 
GATGAGGATA CAGATGCGTT 
ATATATATAT ATATATATTT 
TTTTAAAAGG CTTCCCTAAG 
GAGGGCTGTG GTCCCCAAAG 
GGAAGGACTG TGTCCCTCCT 
AGACAGGGGT TTCCTCGCTG 
TAAATTAATG AAAAGATTAA 
TGGTGNTCTG TGTGCCNTGG 
TTNCTATAAA AANNTAAATN 
NNGTNAGTCT NATTANCCGA 
GTGGGGTGTG TGCGGACCCC 
AAGNCTNGGG GGTTATCGNN 
TAATAATAAA AANNCAAACA 
GGGGGGAAAG GATCCTATAG 
GTTCCAACAC CCACGTCCCG 
CATATTTGAA TCCCATCTCT 
CCCGTGCTGT GACCGGATAG 
GCATCCGATA GAAGCTGTTG 
TATCACCCAG CTGTCAGGCT 
TAATTTAATT ACAACAAATA 
TTCACTGTCT GTNAATGGNA 
ANGAAAGCGG GGNAGTGCTC 
ACTTCACTTA CCATATTCGN 
ACTNGATTTA TTANGCGCTT 
ATCNNNNATT TTTTTTTTTC 
CCACGTTTCA GGAGGATTTT 
GGCTTGCTTT CTGGGCCCTG 
CACAGTGGCA GGTCACCTGG 
AAATTATTGG TGAATTTCGA 



ATATATAAAC 


7140 


ATATATAAAC 


7200 


TATATTAACA 


7260 


TATATATATA 


7320 


APAPAfTPAA 


7380 


TGATGGAGAG 


7440 


TTTGGCTCCT 


7500 


ATCGTTACGA 


7560 


CGATACAGCC 


7620 


TAGGGCGTCT 


7680 


TCCACCCCTC 


7740 


AGTTTCGCTA 


7800 


GATNCTGCAA 


7860 


TACTTGTNNA 


7920 


NCGAGAGCAN 


7980 


NGAGNAACGC 


8040 


ATTGCNNAGG 


8100 


1Y3GGAATCNC 


8160 


TAAAGGPGAG 


8220 




8280 


PTP/TGAATTP 


8340 


AO 1 X I Vnn A X 


8400 


PTGATTAAPP 


8460 


TCTAATC GAA 


8520 


X\v X X X X X 


R580 


AATCAAAANN 


8640 


XXX ni\ X nvjnn 


8700 


GGCCTNAATT 


8760 


CACGNCAGCN 


8820 


CTCTCNNGCC 


8880 


CTTTTTTGAA 


8940 


GAGGANGACA 


9000 


ATGGTCAGTG 


9060 


TGTCAGCACC 


9120 
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AGGNCAGGGG 


CCTTTTTGGC 


GGGGGTGTGA 


GGGANGGATG 


ANCTTTGCTG 


GGAAANNCAG 


9180 




GATCAGGTTC 


TCCAGGCGCA 


CTGCAGCCCG 


GTAGGACCCA 


CTTTGGAAAT 


GAAAAGCCAG 


9240 


5 


TTNCCGAAAG 


CTGGGCTGGA 


AGCTTCCGTG 


TTGGGTTCAA 


GAGCAAGTTC 


ACGTTGCGCT 


9300 




GTGTAGACTC 


CTGGCTGCTC 


CCAAACTCTG 


AGGGTTTTCT 


GAGGTTCCCT 


TCATAGGGGC 


9360 




ACCGGCCCTG 


GGCCATGCAC 


AGTGCGTAAG 


GGTGGCTGTG 


GGCCGAGGGA 


CCCAGCACGT 


9420 


IV 


GTTTTGCCCA 


CAACAGCCGG 


AGTGACTGGT 


TCACTCACCG 


CCTTGGCGGA 


GGACGCCTGT 


94S0 




TCTCTGGACG 


AATCATTTCT 


CTTGGGTGGT 


GACTGCCTTG 


TGGGTCAAGG 


TGCAGGTTTT 


9540 




CTGCCACAGA 


AAACCTGTTA 


GGAGGAATTA 


AGCGACTAAG 


ACTGTCAGGG 


AGGTGGTGGT 


9600 


15 


GGGGGANGAG 


GNAGGGGGTG 


GTGTCCAGAT 


TACCAGGCAT 


AGGCTAAACT 


GCCTGCACTC 


9660 




TCCAGCTGGT 


CTGTCTGTGG 


AGGAGGGGAT 


TGTCAATACT 


GGGAGAGCAG 


AGGAGGCTCG 


9720 




TAGGAGGTGA 


GAGGGGGTGG 


AATTTGCATG 


CAAATCTTCA 


CATGAGGCCT 


GTGTGAATTT 


9780 


20 


CTCCAGCCTC 


CTGAGGGTCC 


CCTGCGCTAT 


TGCACTCAAC 


TTCTTGATAG 


TTTACCCCAA 


9840 




GACTCAGAAG 


TCCTTAGAGG 


GGCAGAATGC 


CCCCACCACA 


AAGCCTGCTA 


TCCTTGGGCG 


9900 




TCCTCAGGAC 


CCTTGGTCAT 


GAATGGGACC 


CTTTCATGTA 


TGGGGACCCT 


TGGTAATATG 


9960 


25 


AATGGGACGC 


CTTCAGCTCC 


CCAGGGCTTC 


CGAGGAGGCC 


GAGAAGGGCA 


AAGACACTTC 


10020 




CGAGGAGGCC 


GAGAAGGGCA 


AAGACATTTT 


CTGGGCTTGG 


TGTGTCAAGA 


GCTAGATTGG 


10080 




AGAAGGGGCT 


GGATTTGGAA 


CTCTTTAGCC 


ATCAGCTCAC 


CCTCTCCGTT 


TGTGGCTAAA 


10140 


30 


GTCTGAAGGT 


GGAAAC TTCG 


GTTCTCCTAC 


AGGGTCTACA 


GGAGTTGGGG 


GGCGGGGCGC 


10200 




CCACACAGAA 


CGCTGGAAAG 


TTCGACAGTC 


CACTTCCACT 


GGCTCGGAAC 


TCACTTTTTC 


10260 




ACCTTAAGTT 


CATCAGCGGT 


AACGCATAGG 


TCTCACTTAG 


GCAGGGCACG 


GATGATTTAA 


10320 


35 


CAATTTCTAC 


TTCTAGGTCA 


GGTGCGGTGG 


CTCACACCTC 


TAATCCCAGC 


ACTTTGGGAG 


10380 




GCCCAGGAGG 


GTGGATCGCT 


TGAGGTCAGG 


AGTTTGAGAC 


CAGCCTGGCC 


AACATGGTGA 


10440 




AACCCCGTCT 


CTACTAAAAT ACGAAAATTA 


GCCAGGCATG 


GTGGTGAGCA 


CCTGTAATTC 


10500 


40 


CAGCTACTCG 


GGAGGCTGAG 


GCAGGAGAAT 


CGCTTGAACC 


TGGGAGGTGG 


ACGTTGCAGT 


10560 




GAGGTGAGAT 


CACACCACTG 


CACTCCAGCC 


TGGATGAGAG 


AGCAAGACTC 


TGTCTCAAAA 


10620 




ACAAAATAAA 


ACAAAAACAA 


AACAAAAATC 


AAAAAAGAAA 


ACCCAATTTC 


CAGTTCTAGG 


10680 


45 


CCAGGTGCAG 


TGGCTCACGC 


CTGTCATCCC 


AGCACTTTGG 


GAGGCCCAGG 


AGGGTGGATC 


10740 




GCTTGAGGTC 


AGGAGTTCGA 
GTTAGCTGGG 


GACCAGCCTG 


GCCAACATGG 
TGCGCCTGTA 


TGAAACCCCA 
ATCCCAGCTA 


TCTTTACTAA 
CTCGGGAAGC 


10800 
10860 




AAATACAAAC 




50 


TGAGGCTGGA 


GAATTGCTTG 


AATCTGGGAG 


GTGGAGGTTG 


CAGGGAGGCG 


AGATAGTGCC 


10920 




ACTGCAGTCC 


AGCCTGGACC 


AGAGAGCAAG 


ACTCCGTCTC 


AAAAACAAAA 


GAAAGCAAAA 


10980 




ACAAAAAACA 


AGAGACCAGC 


CTGGCCAACA 


TGGTGAAACC 


GCGTCTTTAC 


TAAAATACAA 


11040 




AATTAGCCGG 


GCATGGTGGT 


GGGCACCTGT 


AGTCCCAGCT 


ACTCGGGAGG 


CTGAGGCAGG 


11100 



55 



29 



EP 1 260 228 A2 





a o a * TOOOMUP 

AGAATGGCTT 


n aa o otooo a 


flfl r POf±'h.ClC T T r P QPAfiTflAGPP 


GAGATAGTGC 


CACTGCACTC 


11160 




CAGCCTGtaGC 


OA O* O AOOOA 

Lj AL. Ao Avj L. vjA 


r i nr ,r P j 7Y27A f T v P f p pap.aappapp 


ACCACCACAA 


CAAAACAAAA 


11220 


5 


nn * n ah TV Too 

CAAAAAATCC 


A A A R A A AOOO 

AAAAAAAULV 


OAATTTPPAP. f APTAP/JTAfZ 


1 V_.xAVj 1 VJrt A \J\_ 


AGGGCTGGAG 


11280 




iv ^ % ^ tv oo r*r^f~* 

ACAGAGGGGC 


ojOTTi A/^rTV^TV^ 


TVK/7p<vpr , a ppznVTACJT'p a 

IbUiC^tLLA LLA 1 V AVj i. L. A 


1 \_ V_ V_ /A.VJ3\_ X 


rrPANGAGGT 


11340 




GCAAAGTGCT 


nvwnro a o>o 0 
TGGTTC AGC C 


TO ATPOiOTV IVP P27V TOP'TVPPT 




VMUlvVJUl 


11400 


in 

IU 


ACAGGGCTCT 


mo tv 0 * totv»t 
TCACATCTC 1 


oTv^TooTTtoTi mpppps. TAfVT 

C lV iv»C 1 IV 1 aNUCCUAAovj i 


1 1 1 «v-v«rV 




11460 




OOO A S^(TW^ 

GCCAAGTGCC 


OJATO ?\ A A O A TV O 


AO A ATO AOAf P.OATA. A 7Af2P.T 
AvjAAIVAljAi oLAl/i/vivAj 1 


VtvtO ivj i\,vjv>\j 




11520 




CCTGAAGCTG 


LtajLtvjA ICC lu 


/-< mOO 7A OO 7A flTi /^TATV2T2/Tn r rT , rt 


AP AA(V5Tfif"P 


GGCTACACCC 


11580 


15 


AGGACCACCA 


C ACTG AC AC C 


HW , 1V , PPT'T | T /^/^TA/^TAOTAft/^r* 


\J IVn IV- i A VTVJ 


f5P AP At^PP A A 


11640 




CCACCTAGAC 


GCCTGCCNGA 


^nW^OAPPrMT TAr'/^TV^TA TaO TAT 
GM\j<iC AC LC 1 AC Vj Iv-HALA 1 




ppp TATTIPPTT 1 


1 1700 




moo t\ t\ 0 a ^-1 om 

TCCAACAGGT 


AGCTC AC TTT 


TTCTTCCTCT GNAAGAJX-CC 


T A<^T2P A PPTY1 


PTY^P TP PPTT 


I 1 7fi0 

II 'OU 


20 


CCCCTTTCCC 


CTATTTGCTG 


^m/^KTV/Vlvo Tvr»TVOTV , r , 'T»TVi^ 

CCGCATCCTG ACACTCC TAG 


1CCC 1CCC lv» 


PPPPTY2PAP.A 
LtLL JoUAvjA 


1 1 fl90 




CTTCTCAGCT 


PfPrPHVP A O A 

viGCCC 1 I AQjA 


TV TV TA TA TA A PCpip PTTTTV^P TA O 
AAAAAAGLC 1 till l\-L.\jrAvjr 




PAP^PAPPTT 


1 1 fiRO 




GGCACCTATG 


■tv tv tv frvi^ tv ooom 

AAATCAGGCT 


GGGCCAGGCG GGGTGGCTCA 


pippnvTpa'Ti 
CACC 1\» 1V-A 1 


pppaftpiPTT 
CCL.AOL.AL. X 1 


11? 


25 


TOOO A r*f*f*r* 74 

TGGGAGGCCA 


AOiOTTAOO TV O 

AGGTTAGGAG 




VvAIAVjCAAAA 


P^PTYJTPTPT 1 


19000 




7> /""im A TV TV A TV fTlTV 

ACTAAAAATA 


0>AAAAAAAAA 

CAAAAAAAAA 


TVT»TA TA^ 1 7Vr5/V5TA ^TV2/* , 'TWi , IV2i^ 

1 1 AALAbuuA i. \jVj AviO 


rtO A PP TVIT A A 


TPPPAf2PTAP 


15060 




TTGGGAGGCT 


oj tv ooo iv oj"« a o 
GAGGCAGGAG 


AA 1CAC 1 i\»A ALCC wjUALtG 


ppp TAnPTTYlP 
Cv*v»Avivj 1 1 VtV. 




1Z 1 z u 


30 


/> jv mOO'TV 1 OO TV 

GATCGTGCCA 


TTGCACTCCA 


GGCIvtGGCGA C AasAo 1\jAVjA 


PfPTVTPTV TA 


A A A A ATA A. AT 
AAAAA1AAA1 


1 91 AO 




TV A A fTITV TV A rp TV TV 

AAATAAATAA 


ATGTAAAAAA 


ATAAAAATAG GTCGGGCACG 


G IajGC ICACVj 


mpTPTl ATPP 


i 99dn 

1ZZ *1U 




CAGCACTTTG 


/■"* tv tv /"^r^ooo tv o 

GAAGGCCGAG 


GTGGGTGGAT GACAGGGrCA 


AvjAGAI IvtAvj 


SPP1VPPPTV5P 

ACCA IV L. 


1 9^nn 


35 


OOTl * /"I TV fTV^/IO 

CCAACATGGC 


tv tv tv umAonnm 

AAAATGC CGT 


/*i rrv*> m a m Ti tv jv a a AmTV/r a a a a 

C TC T AC T AAA AAATACAAAA 


ATTAGGCGGG 


PPTWWVPP 

CGTGGTGGCG 




GGTGCCTGTA 


tv m/>no a o o m a 

ATCCCAGC TA 


CTCGGGAGGC TGAGGCAGGA 


O A A fPP'PPTVTV n l 

GAATCGGTTvj 


a a /^pr'prp a tv 
AACCLGotaA 1 






GCGGAGGTTG 


tv /^nv tv o ooo 
CAGTGAGCGG 


>v o tv rrv" 1 a tv tv> a r^rr\r*r t a Piv , r' 
AGATCACATC ACTGCACTCC 


AvivjL. TUCA»C A 


AO A srippp A 
ACAAvjAvjCvjA 




40 


AACTGCGTCT 


rrtTv O a a miv tv tv n\ 

TACAATAAAT 


TV TV IV m A TV m TV A AfTIAAAWAAAO 

AAATAGATAA ATAAATAAAC 


A A ATT A a A PfpTt 

AAATAAACTT 


T AC TTT AG AA 




TV ^ TV TV TVITIOOOTI 

ACAAATCCCT 


0> WO r^r* mo iiuiun 

GTCCGTGTTT 


GTCTTTTCAC CTGTCCTGCA 


/"W*A A A AO A A 

GotaAAAACAA 


A AO ATA A A AT 

AACA i AAAA I 


xzouu 




GTCAAGGCAA 


TV m tv f>tr\ » /*s m/^ 

ATAGTAGTGA 


l IU lim^ TV imil/^/V^ A A A A TV O A A 

TTTCATTCCG GGAAAAAGAA 


AGTGGATGTT 


mPPPKTP A OO 

TGCCTTCACC 


X^obu 


45 


0 Twivrvt moomo 

CTTTCTCGTC 


CTTCCTCTGG 


TGCTCCTCAN GGCCCAIMGGG 


MRPJAPJ^PTVy 


A A TAPTMPirift 
AAAVj 1 NL. ALaA 


1 9*79fl 
1Z / ZU 


OO A AO A TV TV O A 

GGAAGAAAGA 


CGGGGCTGGG 


GGGGOGGG iC v-Vj IWwvjALL 


PlfiPPlfifiTl 
LAbbLnbbL A 


TP-TTppplvjaT 


1 97fi0 
1 z / ou 




TTCCNTGTCT 


TO TA OXTTTO TV TV 

1VACNTTCAA 


AGNAuVjuvjCL. ivVjWv- IV i 




PTAPftP-TTTp 
v_ A t\\~\J\j X X X\, 


1 9ftd0 




CTTTCCCNGA 


AG AGTTNCC C 


CTTTGTGAGC TTACGGCTTC 


GGAGTGAACC 


TCGGTGCAAC 


12900 


50 


CTGTTATTAA 


AACACACAGA 


GGCTAATGCC AGCAAAAACA 


CGCCCCCCGC 


TCCTGGTTTC 


12960 




AGAGGGAAGA 


AAAAAATTCA 


TAAGCACGGC CATGCTTTTC 


TAATAAAAAT 


TCATTAAATA 


13020 




ATCGTTATAA 


GGGATGAAGC 


CGGGAGGGGA GAGGAGAGGA 


ACACAATCAA 


GAGACTTTCT 


13080 


55 


TTGAACTTTT 


TCTCCCTGCT 


TCAAATACAA AGCAATCTTC 


TGTGGGCCTG 


GGCCTGGGGG 


13140 



30 
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GTTTCCCCCT TTCTCTGCAG CCCATTGGGA GGAAGAAAAT 


GCTTCCCTGA 


ANGTTGCTGC 


13200 




AAAATTGTTT CTGTTTTTCT TTTCTTTTTC TTTTTTTTTT 


TTTTTTGAGA 


CGGAGTCTCG 


13260 


5 


CTCTGTCACC AGGCTGGAGT GCAATGGTAT GATCTCAGCT 


CACTGCAACC 


TCCACGTTCC 


13320 




TGTTTCAAGT CATTCTCCTG CCTCAGCCTC CTGAGTAGCT 


GGGACTACAG 


GCGCCCGCCA 


13380 




CCACGCCCGG CTAGTGTTTG TATTTTTAGA AAAGACAGGG 


TTTCCCCATG 


TTGGCCAGGC 


13440 


10 


TGGTCTTGAA CTCCTGTCCT CAAGTGATCT GCCTGCCTCG 


GCCTCCCAAA 


GTGCTGTGTT 


13500 




TCTGTTTTTC TTTCCCCGCT TTCTTAGGAG GCCATCGGGA 


AGAATAAAAT 


GCTTTCCTTG 


13560 




AAGTTGATGC AAAATTGTTT CTGTTTTTCT TTTCTCTTTT 


CTTTCTTTTT 


GAGATGGAGT 


13620 


15 


CTCGCTCTTT CACCCAGGCT GGAGGGCAGT GGCGCGACCT 


CGGCTCACTG 


CAACCTCCGC 


13680 




CTCCCGGGTT CAAGCGATTC TCCTGCCTCA GCCTCCGGAG 


TAGCTGGGAT 


TACAGGCACC 


13740 




TGCCACTATG CCTGGCTAAT TTTATTATTT TTAGTAGAGA 


CGGGGTTTCA 


CCATGTTGGC 


13800 


20 


CAGGCTGGTC TCAAACTCCT GACCTCAGGT GATCCGCCCG 


CCTCGCCTCC 


CAAAGTGATG 


13860 




GGATGANCAG GNCATNGAGC NCACCGTGCC CGGCCCTCTA 


ACTCTTTACC 


AGACATAAAG 


13920 




TCTCCNNTTC CCCTTTCTAA ATGTATATAT TGTGTTTTTA 


AAAGTTAACA 


GCAGGGATCC 


13980 


25 


CACCTCATTN CCCCGCTNCT CTCCCCAAGA CCTGTCCTGC 


ACGTTGCACA 


CAGCAGGTGT 


14040 




GCCCTGGACA TATCCCAAAC CCACGCTGAA AGAAAGAGGG 


TCTCACTACA 


CGTATGATAT 


14100 




CTGTGNATCC TTTAAACATC TCCGTGGCTT CCAGGCAACA 


CAGCCATAAA 


TAGGAATCTC 


14160 


30 


ATGTCTGACA TGATACCGGG ACCATGTATG GGNAAATTCT GGGTGTGAAG 


TTCCAGCTAC 


14220 




CCCCGCAGAG GCANCCATTG CATACCCTCC AGAAACTCCC 


CTGCCGTTNC 


AAGCCAAAGA 


14280 




CACAACACAA ACAGCNTCCG AGAGAGGGTG TCATTGAAAA TCAATACCAT 


CATAAGAGCA 


14340 


35 


CACAGCACCG TCTTTCTCTT CTGCCCGTTG ATACACAATT 


ATGAGCAATT 


TGCTAACACT 


14400 




GACAACTCGT GGCAAGAACA GGTCGTGTTG ATACGGTTGC 


CTCGTGAGGA 


CCCATCTGTC 


14460 




TTCTGGGGTC TTGCCTGGAA CGGAGATCGG AGTTCAGGGT 


GGCTAATAGA 


ATCATTACTC 


14520 


40 


ACCTAGGGAC ACAGAATNAT GAGGGTTACC CCCAGTTAAG 


TGCATACAGT 


CAAACGGACG 


14580 




GCTGCTCTGG AAGGTACAGT GACGTGAACA GCTTTTATGA AATGCCTAGA 


TCTGGACCTT 


14640 




CCATACCTGA GCCACCGTTC CAAAGCACTG GGCGTTTTTC 


AGATACTTTC 


ATGAGAAATG 


14700 


45 


TTGTCAACAC CGCAAGTTTG CAGTACACAG TCTGAAAGAT ATTCTTGTAT ATGTAGATGT 


14760 




CTGTAGATGC CCTGAAGGTG TGTAGACTTT AGACACCCAG AAGGTGTGTA 


GATGTCTGTA 


14820 




GACACCTTCT ATGTGTGTAG ATGTCTGTAG ACGCCCTGCA GGTGTGTAGA 


TATATCTAGA 


14880 


50 


TGGTCTGCCT GTGTATGATA CAGGCTAAAA AGACATTTGT 


GGTGGACACT 


AGTTGATTAT 


14940 


TTAGGACTAT GAGATGGGAA AGGAAGNAGC AACCAGCAGT 


GAAAGGCATG 


TGGTGGGTGG 


15000 




GGGGTTGGCA TTGCAGTGGG GTCCTCNTGA NGCAGGTGAC 


ACCCACTATA 


GGGCTGCCCT 


15060 




TGGNATGGAC GCTTTGTNGA AGCTGTTTGA TTTCACCACA 


CCAAGCCTGG 


AGGCACGGAC 


15120 



55 
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ATTCCAGGAT GGTGAGGAGT CTGCAAAGGA GGAGATTGGA GGAGGTGCAA TATCCCTAGA 
GTACGAGAGA TGAGATAGGA GAGCTGTATA AATAGCACTA CCAGCCGGAT GCGGTGGCTC 
ACGCCTGTCA TCCCAGCACT TTAGGAGGCT GAGGCAGGCG GATCACCTGA GGTCAGGAGT 
TCCAGAACAG CCTGGCCAAC ACAATGAAAC CCCATCTTTA CTAAAAATAC AAGATTAGCT 
GGGCACGGTG TCTCACGCCT GTCATCCCTG CACTTTGGGA GGTCGAGGTG CGCAGATCAT 
GAGGTCAGTT TGGCCAACGC GGCGAAACCC CGTCTCTACT AAAAATACAA AAAAGTAGCC 
GGGCGTGGTG GTGGGCACCT GTAGTCCCAG CTACTAGGGA GGCTGAGGCA GGAGAATCGC 
TTGAACCCGG ATGCGGACAT TGCAGTGAGC CGAGATC 
{2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 753 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "ET92 gene segment* 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
CGTGGAAGCC TGGAGTTTTT GGGAACAGCG TGTCCCCGCC GAGCCTGGGA GCCCGTGGGT 
TCTGCAAAGC CTGCGGGTGT TTGAGGACTT TGAAGACCAG TTTGTCAGTT GGGCTCAATT 
CCTGGGGTTC AGACTTAGAG AAATGAAGGA GGGAGAGCTG GGGTCGTCTC CAGGAAACGA 
TTCACTTGGG GGGAAGGAAT GGAGTGTTCT TGCAGGCACA TGTCTGTTAG GAGGTGAAAC 
AGAATGTGAA ATCCACGTTG GAGTAAGCGT CCAGCGCTGA ATGTAGCTCG GGGTGGGGTG 
GGAGGGCCCT GGTGTGGATC GTGGAAGGAA GAAAGACAGA ACAGGGTGCT AGTATTTACC 
CCGTTCCCTG TAGACACCCT GGATTTGTCA GCTTTGCAAG CTTCTTGGTT GCAGCGGCCT 
TGCCTGTGCC CCTTTGAGAC TGTTTCCAGA CTAAACTTCC AAATGTCAGC CCCTTACCCT 
TGACAGCAAG GGACATCTCA TTAGGGCATC GCGTGCTTCT CATCTGTGCT CAGCAGGCCC 
GAGATAGGAA CAGAGGGGCG TTGGAGATGC CACTTCCACC AGCCCTGGGT TGAAGGGGAG 
CGAGGGAGAC ACCTTTTACT TAAACCCCTG AGCTTGGTCA GAGAGGCTGA ATGTCTAAAA 
TGAGGAAGAA AAGGTTTTTC ACCTGGAAAC GCTTGAGGGC TGAGTCTTCT GCCCTTCTGA 
CTCCCCCAGC AAATACAGAC AGGTCACCAA CTA 
(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1890 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
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10 



15 



(A) DESCRIPTION: /desc = "SHOXa" 



(ix) FEATURE: 

(A) NAME/ KEY: CDS 

(B) LOCATION: 91. .968 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

GTGATCCACC CGCCGCACGG GCCGTCCTCT CCGCGCGGGG AGACGCGCGC ATCCACCAGC 60 

CCCGGCTGCT CGCCAGCCCC GGCCCCAGCC ATG GAA GAG CTC ACG GCT TTT GTA 114 

Met Glu Glu Leu Thr Ala Phe Val 
1 5 

TCC AAG TCT TTT GAC CAG AAA AGC AAG GAC GGT AAC GGC GGA GGC GGA 162 
Ser Lys Ser Phe Asp Gin Lys Ser Lys Asp Gly Asn Gly Gly Gly Gly 
10 15 20 

GGC GGC GGA GGT AAG AAG GAT TCC ATT ACG TAC CGG GAA GTT TTG GAG 210 
20 Gly Gly Gly Gly Lys Lys Asp Ser lie Thr Tyr Arg Glu Val Leu Glu 

25 30 35 40 

AGC GGA CTG GCG CGC TCC CGG GAG CTG GGG ACG TCG GAT TCC AGC CTC 258 
Ser Gly Leu Ala Arg Ser Arg Glu Leu Gly Thr Ser Asp Ser Ser Leu 
45 50 55 

25 CAG GAC ATC ACG GAG GGC GGC GGC CAC TGC CCG GTG CAT TTG TTC AAG 306 

Gin Asp lie Thr Glu Gly Gly Gly His Cys Pro Val His Leu Phe Lys 
60 65 70 

GAC CAC GTA GAC AAT GAC AAG GAG AAA CTG AAA GAA TTC GGC ACC GCG 354 
Asp His Val Asp Asn Asp Lys Glu Lys Leu Lys Glu Phe Gly Thr Ala 
75 80 85 

AGA GTG GCA GAA GGG ATT TAT GAA TGC AAA GAG AAG CGC GAG GAC GTG 402 
Arg Val Ala Glu Gly lie Tyr Glu Cys Lys Glu Lys Arg Glu Asp Val 
90 95 100 

35 AAG TCG GAG GAC GAG GAC GGG CAG ACC AAG CTG AAA CAG AGG CGC AGC 450 

Lys Ser Glu Asp Glu Asp Gly Gin Thr Lys Leu Lys Gin Arg Arg Ser 
105 110 115 120 

CGC ACC AAC TTC ACG CTG GAG CAG CTG AAC GAG CTC GAG CGA CTC TTC 498 
Arg Thr Asn Phe Thr Leu Glu Gin Leu Asn Glu Leu Glu Arg Leu Phe 
40 125 130 135 

GAC GAG ACC CAT TAC CCC GAC GCC TTC ATG CGC GAG GAG CTC AGC CAG 546 
Asp Glu Thr His Tyr Pro Asp Ala Phe Met Arg Glu Glu Leu Ser Gin 
140 145 150 

45 CGC CTG GGG CTC TCC GAG GCG CGC GTG CAG GTT TGG TTC CAG AAC CGG 594 

Arg Leu Gly Leu Ser Glu Ala Arg Val Gin Val Trp Phe Gin Asn Arg 
155 160 165 

AGA GCC AAG TGC CGC AAA CAA GAG AAT CAG ATG CAT AAA GGC GTC ATC 642 
Arg Ala Lys Cys Arg Lys Gin Glu Asn Gin Met His Lys Gly Val lie 
so 170 175 180 

TTG GGC ACA GCC AAC CAC CTA GAC GCC TGC CGA GTG GCA CCC TAC GTC 690 
Leu Gly Thr Ala Asn His Leu Asp Ala Cys Arg Val Ala Pro Tyr Val 
185 190 195 200 

55 AAC ATG GGA GCC TTA CGG ATG CCT TTC CAA CAG GTC CAG GCT CAG CTG 738 

Asn Met Gly Ala Leu Arg Met Pro Phe Gin Gin Val Gin Ala Gin Leu 
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10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



205 210 215 

CAG CTG GAA GGC GTG GCC CAC GCG CAC CCG CAC CTG CAC CCG CAC CTG 786 
Gin Leu Glu Gly Val Ala His Ala His Pro His Leu His Pro His Leu 
220 225 230 

GCG GCG CAC GCG CCC TAC CTG ATG TTC CCC CCG CCG CCC TTC GGG CTG 834 
Ala Ala His Ala Pro Tyr Leu Met Phe Pro Pro Pro Pro Phe Gly Leu 
235 240 245 

CCC ATC GCG TCG CTG GCC GAG TCC GCC TCG GCC GCC GCC GTG GTC GCC 882 
Pro He Ala Ser Leu Ala Glu Ser Ala Ser Ala Ala Ala Val Val Ala 
250 255 260 

GCC GCC GCC AAA AGC AAC AGC AAG AAT TCC AGC ATC GCC GAC CTG CGG 930 
Ala Ala Ala Lys Ser Asn Ser Lys Asn Ser Ser He Ala Asp Leu Arg 
265 270 275 280 

CTC AAG GCG CGG AAG CAC GCG GAG GCC CTG GGG CTC TG ACCCGCCGCG 978 
Leu Lys Ala Arg Lys His Ala Glu Ala Leu Gly Leu 
285 290 

CAGCCCCCCG CGCGCCCGGA CTCCCGGGCT CCGCGCACCC CGCCTGCACC GCGCGTCCTG 1038 

CACTCAACCC CGCCTGGAGC TCCTTCCGCG GCCACCGTGC TCCGGGCACC CCGGGAGCTC 1098 

CTGCAAGAGG CCTGAGGAGG GAGGCTCCCG GGACCGTCCA CGCACGACCC AGCCAGACCC 1158 

TCGCGGAGAT GGTGCAGAAG GCGGAGCGGG TGAGCGGCCG TGCGTCCAGC CCGGGCCTCT 1218 

CCAAGGCTGC CCGTGCGTCC TGGGACCCTG GAGAAGGGTA AACCCCCGCC TGGCTGCGTC 1278 

TTCCTCTGCT ATACCCTATG CATGCGGTTA ACTACACACG TTTGGAAGAT CCTTAGAGTC 1338 

TATTGAAACT GCAAAGATCC CGGAGCTGGT CTCCGATGAA AATGCCATTT CTTCGTTGCC 1398 

AACGATTTTC TTTACTACCA TGCTCCTTCC TTCATCCCGA GAGGCTGCGG AACGGGTGTG 1458 

GATTTGAATG TGGACTTCGG AATCCCAGGA GGCAGGGGCC GGGCTCTCCT CCACCGCTCC 1518 

CCCGGAGCCT CCCAGGCAGC AATAAGGAAA TAGTTCTCTG GCTGAGGCTG AGGACGTGAA 1578 

CCGCGGGCTT TGGAAAGGGA GGGGAGGGAG ACCCGAACCT CCCACGTTGG GACTCCCACG 1638 

TTCCGGGGAC CTGAATGAGG ACCGACTTTA TAACTTTTCC AGTGTTTGAT TCCCAAATTG 1698 

GGTCTGGTTT TGTTTTGGAT TGGTATTTTT TTTTTTTTTT TTTTTTGCTG TGTTACAGGA 1758 

TTCAGACGCA AAAGACTTGC ATAAGAGACG GACGCGTGGT TGCAAGGTGT CATACTGATA 1818 

TGCAGCATTA ACTTTACTGA CATGGAGTGA AGTGCAATAT TATAAATATT ATAGATTAAA 1878 

AAAAAAATAG CA 1890 

(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 292 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

Met Glu Glu Leu Thr Ala Phe Val Ser Lys Ser Phe Asp Gin Lys Ser 
15 10 15 
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Lys Asp Gly Asn Gly Gly Gly Gly Gly Gly Gly Gly Lys Lys Asp Ser 
20 25 30 

lie Thr Tyr Arg Glu Val Leu Glu Ser Gly Leu Ala Arg Ser Arg Glu 
35 40 45 

Leu Gly Thr Ser Asp Ser Ser Leu Gin Asp lie Thr Glu Gly Gly Gly 
50 55 60 

His Cys Pro Val His Leu Phe Lys Asp His Val Asp Asn Asp Lys Glu 
65 70 75 80 

Lys Leu Lys Glu Phe Gly Thr Ala Arg Val Ala Glu Gly He Tyr Glu 
85 90 95 

Cys Lys Glu Lys Arg Glu Asp Val Lys Ser Glu Asp Glu Asp Gly Gin 
100 105 110 

Thr Lys Leu Lys Gin Arg Arg Ser Arg Thr Asn Phe Thr Leu Glu Gin 
115 120 125 

Leu Asn Glu Leu Glu Arg Leu Phe Asp Glu Thr His Tyr Pro Asp Ala 
130 135 140 

Phe Met Arg Glu Glu Leu Ser Gin Arg Leu Gly Leu Ser Glu Ala Arg 
145 " 150 155 160 

Val Gin Val Trp Phe Gin Asn Arg Arg Ala Lys Cys Arg Lys Gin Glu 
165 170 175 

Asn Gin Met His Lys Gly Val He Leu Gly Thr Ala Asn His Leu Asp 
180 185 190 

Ala Cys Arg Val Ala Pro Tyr Val Asn Met Gly Ala Leu Arg Met Pro 
195 200 205 

Phe Gin Gin Val Gin Ala Gin Leu Gin Leu Glu Gly Val Ala His Ala 
210 215 220 

His Pro His Leu His Pro His Leu Ala Ala His Ala Pro Tyr Leu Met 
225 230 235 240 

Phe Pro Pro Pro Pro Phe Gly Leu Pro He Ala Ser Leu Ala Glu Ser 
245 250 255 

Ala Ser Ala Ala Ala Val Val Ala Ala Ala Ala Lys Ser Asn Ser Lys 
260 265 270 

Asn Ser Ser lie Ala Asp Leu Arg Leu Lys Ala Arg Lys His Ala Glu 
275 280 285 

Ala Leu Gly Leu 
290 

(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1354 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "SHOXb" 
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(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 91. .768 

5 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

GTGATCCACC CGCCGCACGG GCCGTCCTCT CCGCGCGGGG AGACGCGCGC ATCCACCAGC 60 

CCCGGCTGCT CGCCAGCCCC GGCCCCAGCC ATG GAA GAG CTC ACG GCT TTT GTA 114 
10 Met Glu Glu Leu Thr Ala Phe Val 

295 300 

TCC AAG TCT TTT GAC CAG AAA AGC AAG GAC GGT AAC GGC GGA GGC GGA 162 
Ser Lys Ser Phe Asp Gin Lys Ser Lys Asp Gly Asn Gly Gly Gly Gly 
305 310 315 

GGC GGC GGA GGT AAG AAG GAT TCC ATT ACG TAC CGG GAA GTT TTG GAG 210 
Gly Gly Gly Gly Lys Lys Asp Ser He Thr Tyr Arg Glu Val Leu Glu 
320 325 330 

AGC GGA CTG GCG CGC TCC CGG GAG CTG GGG ACG TCG GAT TCC AGC CTC 258 
20 Ser Gly Leu Ala Arg Ser Arg Glu Leu Gly Thr Ser Asp Ser Ser Leu 

335 340 345 

CAG GAC ATC ACG GAG GGC GGC GGC CAC TGC CCG GTG CAT TTG TTC AAG 306 
Gin Asp He Thr Glu Gly Gly Gly His Cys Pro Val His Leu Phe Lys 
350 355 360 

25 GAC CAC GTA GAC AAT GAC AAG GAG AAA CTG AAA GAA TTC GGC ACC GCG 354 

Asp His Val Asp Asn Asp Lys Glu Lys Leu Lys Glu Phe Gly Thr Ala 
365 370 375 380 . 

AGA GTG GCA GAA GGG ATT TAT GAA TGC AAA GAG AAG CGC GAG GAC GTG 402 
Arg Val Ala Glu Gly He Tyr Glu Cys Lys Glu Lys Arg Glu Asp Val 
30 385 390 395 

AAG TCG GAG GAC GAG GAC GGG CAG ACC AAG CTG AAA CAG AGG CGC AGC 450 
Lys Ser Glu Asp Glu Asp Gly Gin Thr Lys Leu Lys Gin Arg Arg Ser 
400 405 410 

35 CGC ACC AAC TTC ACG CTG GAG CAG CTG AAC GAG CTC GAG CGA CTC TTC 498 

Arg Thr Asn Phe Thr Leu Glu Gin Leu Asn Glu Leu Glu Arg Leu Phe 
415 420 425 

GAC GAG ACC CAT TAC CCC GAC GCC TTC ATG CGC GAG GAG CTC AGC CAG 546 
Asp Glu Thr His Tyr Pro Asp Ala Phe Met Arg Glu Glu Leu Ser Gin 
40 . 430 435 440 

CGC CTG GGG CTC TCC GAG GCG CGC GTG CAG GTT TGG TTC CAG AAC CGG 594 
Arg Leu Gly Leu Ser Glu Ala Arg Val Gin Val Trp Phe Gin Asn Arg 
445 " 450 455 460 

45 AGA GCC AAG TGC CGC AAA CAA GAG AAT CAG ATG CAT AAA GGC GTC ATC 642 

Arg Ala Lys Cys Arg Lys Gin Glu Asn Gin Met His Lys Gly Val He 
465 470 475 

TTG GGC ACA GCC AAC CAC CTA GAC GCC TGC CGA GTG GCA CCC TAC GTC 690 
Leu Gly Thr Ala Asn His Leu Asp Ala Cys Arg Val Ala Pro Tyr Val 
480 485 490 



50 
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AAC ATG GGA GCC TTA CGG ATG CCT TTC CAA CAG ATG GAG TTT TGC TCT 738 
Asn Met Gly Ala Leu Arg Met Pro Phe Gin Gin Met Glu Phe Cys Ser 
495 500 505 

TGT CGC CCA GGC TGG AGT ATA ATG GCA TGA TCTCGACTCA CTGCAACCTC 788 
Cys Arg Pro Gly Trp Ser He Met Ala * 
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510 515 
CGCCTCCCGA GTTCAAGCGA TTCTCCTGCC TCAGCCTCCC GAGTAGCTGG GATTACAGGT 
GCCCACCACC ATGTCAAGAT AATGTTTGTA TTTTCAGTAG AGATGGGGTT TGACCATGTT 
GGCCAGGCTG GTCTCGAACT CCTGACCTCA GGTGATCCAC CCGCCTTAGC CTCCCAAAGT 
GCTGGGATGA CAGGCGTGAG CCCCTGCGCC CGGCCTTTGT AACTTTATTT TTAATTTTTT 
TTTTTTTTTA AGAAAGACAG AGTCTTGCTC TGTCACCCAG GCTGGAGCAC ACTGGTGCGA 
TCATAGCTCA CTGCAGCCTC AAACTCCTGG GCTCAAGCAA TCCTCCCACC TCAGCCTCCT 
GAGTAGCTGG GACTACAGGC ACCCACCACC ACACCCAGCT AATTTTTTTG ATTTTTACTA 
GAGACGGGAT CTTGCTTTGC TGCTGAGGCT GGTCTTGAGC TCCTGAGCTC CAAAGATCCT 
CTCACCTCCA CCTCCCAAAG TGTTAGAATT ACAAGCATGA ACCACTGCCC GTGGTCTCCA 
AAAAAAGGAC TGTTACGTGG AAAAAA 

(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 226 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

Met Glu Glu Leu Thr Ala Phe Val Ser Lys Ser Phe Asp Gin Lys Ser 
15 10 15 

Lys Asp Gly Asn Gly Gly Gly Gly Gly Gly Gly Gly Lys Lys Asp Ser 
20 25 30 

lie Thr Tyr Arg Glu Val Leu Glu Ser Gly Leu Ala Arg Ser Arg Glu 
35 40 45 

Leu Gly Thr Ser Asp Ser Ser Leu Gin Asp lie Thr Glu Gly Gly Gly 
50 55 60 

His Cys Pro Val His Leu Phe Lys Asp His Val Asp Asn Asp Lys Glu 
65 70 75 80 

Lys Leu Lys Glu Phe Gly Thr Ala Arg Val Ala Glu Gly lie Tyr Glu 
85 90 95 

Cys Lys Glu Lys Arg Glu Asp Val Lys Ser Glu Asp Glu Asp Gly Gin 
100 105 110 

Thr Lys Leu Lys Gin Arg Arg Ser Arg Thr Asn Phe Thr Leu Glu Gin 
115 120 125 

Leu Asn Glu Leu Glu Arg Leu Phe Asp Glu Thr His Tyr Pro Asp Ala 
130 135 ' 140 

Phe Met Arg Glu Glu Leu Ser Gin Arg Leu Gly Leu Ser Glu Ala Arg 
145 150 155 160 

Val Gin Val Trp Phe Gin Asn Arg Arg Ala Lys Cys Arg Lys Gin Glu 
165 170 175 

Asn Gin Met His Lys Gly Val He Leu Gly Thr Ala Asn His Leu Asp 
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180 185 190 

Ala Cys Arg Val Ala Pro Tyr Val Asn Met Gly Ala Leu Arg Met Pro 
195 200 205 

Phe Gin Gin Met Glu Phe Cys Ser Cys Arg Pro Gly Trp Ser He Met 
210 215 220 

Ala * 

225 

(2) INFORMATION FOR SEQ ID NO: 14: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32367 base pairs 

(B) TYPE: nucleic acid 
15 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "COSMID: LLNOYC03 * M * 3 4 F5 ■ 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

TTTCTCTGTC TCCATCCCTC TGTCTCTCCC TTTCTCTCTG TCTTTCCTTG TCTCTCTCTT 60 

25 TCTCTCTCTC TCTCCATCTC TGTCTCTCCC TGTCTCTCTC TCTCCATCTC CCCGTCTCTC 120 

CGTTTCTCTC TCTGCCTCTC CCTGTCTGTC TCTCTCTTTC TGTGTCTTAC ACACACCCCA 180 

ACCCACCGTC ACTCATGTCC CCCCACTGCT GTGCCATCTC ACACAAGTTC ACAGCTCAGC 240 

30 TGTCATCCTG GGTCCCCAGG CCCCGCCGGG GAGGAAGATG CGCCGTGGGG TTACGGGAGG 300 

AAGGGGACTC CGGGCCTCCT GGTGCCCCAC TTTATTTGCA GAAGGTCCTT GGCAGGAACC 360 

GTGACGCGTT TGGTTTCCAG GACTTGGAAA ACGAATTTCA GGTCGCGATG GCGAGCACCG 420 

35 GCTTCCCCTG AAGCACATTC AATAGCGAGA GGCGGGAGGG AGCGAGCAGG AGCATCCCAC 480 

CATGAAAACC AAAAACACAA GTATTTTTTT CACCCGGTAA ATACCCCAGA CGCCAGGGTG 540 

ACAGCGCGGC GCTAAGGGAG GAGGCCTCGC GCCGGGGTCC GCCGGGATCT GGCGCGGGCG 600 

40 GAAAGAATAT AGATCTTTAC GAACCGGATC TCCCGGGGAC CTGGGCTTCT TTCTGCGGGC 660 

GCTGGAGACC CGGGAGGCGG CCCCGGGGAT CCTCGGCCTC CGCCGCCGCC GCCTCCCAAG 720 

CGCCCGCGTC CCGGTTTGGG GACACCCGGC CCCTTCTTCT CACTTTCGGG GATTCTCCAG 780 

45 CCGCGTTCCA TCTCACCAAC TCTCCATCCA AGGGCGCGCC GCCACCAACT TGGAGCTCAT 840 

CTTCTCCCAA GATCGTGCGT CCCCGGGGCG CCCGGGTCCC CCCCCTCGCC ATCTCAACCC 900 

CGGCGCGACC CGGGCGCTTC CTGGAAAGAT CCAGGCGCCG GGCTCTGCGC TCCTCCCGGG 960 

50 AGCGAGGGCG GCCGGACGAC TGGGACCCTC CTCTCTCCAG CCGTGAACTC CTTGTCTCTC 1020 

TGTCTCTCTC TGCAGGAAAA CTGGAGTTTG CTTTTCCTCC GGCCACGGAG AGAACGCGGG 1080 

TAACCTGTGT GGGGGGCTCG GGCGCCTGCG CCCCCCTCCT GCGCGCGCGC TCTCCCTTCC 1140 

55 AAAAATGGGA TCTTTCCCCC TTCGCACCAA GGTGTACGGA CGCCAAACAG TGATGAAATG 1200 
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AGAAGAAAGC CAATTGCCGG CCTGGGGGGT GGGGGAGACA CAGCGTCTCT GCGTGCGTCC 1260 

GCCGCGGAGC CCGGAGACCA GTAATTGCAC CAGACAGGCA GCGCATGGGG GGCTGGGCGA 1320 

5 GGTCGCCGCG TATAAATAGT GAGATTTCCA ATGGAAAGGC GTAAATAACA GCGCTGGTGA 1380 

TCCACCCGCG CGCACGGGCC GTCCTCTCCG CGCGGGGAGA CGCGCGCATC CACCAGCCCC 1440 

GGCTGCTCGC CAGCCCCGGC CCCAGCCATG GAAGAGCTCA CGGCTTTTGT ATCCAAGTCT 1500 

10 TTTGACCAGA AAAGCAAGGA CGGTAACGGC GGAGGCGGAG GCGGCGGAGG TAAGAAGGAT 1560 

TCCATTACGT ACCGGGAAGT TTTGGAGAGC GGACTGGCGC GCTCCCGGGA GCTGGGGACG 1620 

TCGGATTCCA GCCTCCAGGA CATCACGGAG GGCGGCGGCC ACTGCCCGGT GCATTTGTTC 1680 

15 AAGGACCACG TAGACAATGA CAAGGAGAAA CTGAAAGAAT TCGGCACCGC GAGAGTGGCA 1740 

GAAGGTAAGT TCCTTTGCGC GCCGGCTCCA GGGGGGCCCT CCTGGGGTTC GGCGCCTCCT 1800 

CGCCACGGAG TCGGCCCCGC GCGCCCCTCG CTGTGCACAT TTGCAGCTCC CGTCTCGCCA 1860 

20 GGGTAAGGCC CGGGCCGTCA GGCTTTGCCT AAGAAAGGAA GGAAGGCAGG AGTGGACCCG 1920 

ACCGGAGACG CGGGTGGTGG GTAGCGGGGT GCGGGGGGAC CCAGGGAGGG TCGCAGCGGG 1980 

GGCCGCGCGC GTGGGCACCG ACACGGGAAG GTCCCGGGCT GGGGTGGATC CGGGTGGCTG 2040 

25 TGCCTGAAGC CGTAGGGCCT GAGATGTCTT TTTCATTTTC TTTTTCTTTC CTTTCCTTTT 2100 

TTTGTTTGTT TGTTTGTTTG TTTGAGACAG AGTCTCGCTC TGTCCCCCAG GCTGGAGTGC 2160 

AGTGGTGCGA TCTCGGCTCA CTGCAACCTC CGCCTCCTGG GTTCAAGCGA TTCTCCTGCC 2220 

TCAGCCTCCC CAGTAGCTGG GATTACAGGC ATGCACCACC ACGCCTGGCT AATTTTTGTG 2280 

CTTTTAGTAA AGACGGGGAT TCACCATGTT GGCCAGGCTG GTCTCGAACT CCTGACCTCA 2340 

GGTGATCCAC CCGCCTCGGC CTCCCAAAGT GCTGGGATGA CAGGCGTGAG GCACCGCGCC 2400 

CGGCCTGGGT CCTGACGGCT TAGGATGTGT GTTTCTGTCT CTGCCTGTCT GCCTTGTATT 2460 

TACGGTCACC CAGACGCACA GAGGAGCCGT CTCCACGCGC CTTCCCAGCG CTCAGCGCCT 2520 

GCCGGGCCCC CGGAGATCAC GGGAAGACTC GAGGCTGCGT GGTAGGAGAC GGGAAGGCCC 2580 

CGGGTCAGCT CGGTTCTGTT TCCTTTAAGG AACCCTTCAT TATTATTTCA TTGTTTTCCT 2640 

TTGAACGTCG AGGCTTGATC TTGGCGAAAG CTGTTGGGTC CATAAAAACC ACTCCCGTGA 2700 

GCGGAGGTGG CCGGGATCTG GATGGGGCGC GAGGGGCCCC GGGGAAGCTG GCGGCTTCGC 2760 

GGGCGCGTCC TAAGTCAAGG TTGTCAGAGC GCAGCCGGTT GTGCGCGGCC CGGGGGAGCT 2820 

CCCCTCTGGC CCTTCCTCCT GAGACCTCAG TGGTGGGTCG TCCCGTGGTG GAAATCGGGG 2880 

AGTAAGAGGC TCAGAGAGAG GGGCTGGCCC CGGGGATCTC TGTGCACACA CGACAACTGG 2940 

GCGGCATACA TCTTAAGAAT AAAATGGGCT GGCTGTGTCG GGGCACAGCT GGAGACGGCT 3000 

ATGGACGCCT GTTATGTTTT CATTACAAAG ACGCAGAGAA TCTAGCCTCG GCTTTTGCTG 3060 

ATTCGCAGAG TTGAGGTGCG AGGGTGAATG CCCCAAAGGT AATTCTTCCT AAGACTCTGG 3120 

GGCTACCTGC TCTCCGGGGC CCTGCATTTG GGGTGTGGAG TGGCCCCGGG AAATAGCCCT 3180 

TGTATTCGTA GGAGGCACCA GGCAGCTTCC CAAGGCCCTG ACTTTGTCGA AGCAGAAAGC 3240 
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TGTGGCTACG GTTTACAAAG CAGTCCCCGG TTTCTGACCG 


TCTAAGAGGC 


AGGAGCCCAG 


3300 


CCTGCCTTTG ACAGTGAGAG GAGTTCCTCC CTACACACTG 


CTGCGGGCAC 


CCGGCACTGT 


3360 


AATTCATACA CAGAGAGTTG GCCTTCCTGG ACGCAAGGCT 


GGGAGCCGCT 


TGAGGGCCTG 


3420 


CGTGTAATTT 


AAGAGGGTTC 


GCAGCGCCCG GCGGCCGCTT 


CTGTGGGGTT 


GCTTTTTGGT 


3480 


TGTCCTTCGC 


AGACACCGTT 


TTGCTCCTCT GAACTCTCTC 


TTCTCCCCCT 


GGCCGTGGAC 


3540 


CCGGGAGAGC 


AAAGTGTCCT 


CCAGACCTTT TGAAAGTGAG 


AGGAAAATAA 


AGACCAGGCC 


3600 


AAAGACCCAG 


GGCCACAGGA 


GAGGAGACAG AGAGTCCCCG 


TTACATTTTC 


CCCTTGGCTG 


3660 


GGTGCAGAAA 


GACCCCCGGG 


CCAGGACTGC CACCCAGGCT 


ACTATTTATT 


CATCAGATCC 


3720 


AAGTTAAATC 


GAGGTTGGAG 


GGCAGGGGAG AGTCTGAGGT 


TACCGTGGAA 


GCCTGGAGTT 


3780 


TTTGGGAACA 


GCGTGTCCCC 


GCCGAGCCTG GGAGCCCGTG 


GGTTCTGCAA 


AGCCTGCGGG 


3840 


TGTTTGAGGA 


CTTTGAAGAC 


CAGTTTGTCA GTTGGGCTCA 


ATTCCTGGGG 


TTCAGACTTA 


3900 


GAGAAATGAA 


GGAGGGAGAG 


CTGGGGTCGT CTCCAGGAAA 


CGATTCACTT 


GGGGGGAAGG 


3960 


AATGGAGTGT 


TCTTGCAGGC 


ACATGTCTGT TAGGAGGTGA 


AACAGAATGT 


GAAATCCACG 


4020 


TTGGAGTAAG 


CGTCCAGCGC 


TGAATGTAGC TCGGGGTGGG 


GTGGGAGGGC 


CCTGGTGTGG 


4080 


ATCGTGGAAG 


GAAGAAAGAC 


AGAACAGGGT GCTAGTATTT 


ACCCCGTTCC 


CTGTAGACAC 


4140 


CCTGGATTTG 


TCAGCTTTGC 


AAGCTTCTTG GTTGCAGCGG 


CCTTGCCTGT 


GCCCCTTTGA 


4200 


GACTGTTTCC 


AGACTAAACT 


TCCAAATGTC AGCCCCTTAC 


CCTTGACAGC 


AAGGGACATC 


4260 


TCATTAGGGC 


ATCGCGTGCT 


TCTCATCTGT GCTCAGCAGG 


CCCGAGATAG 


GAACAGAGGG 


4320 


GCGTTGGAGA 


TGCCACTTCC 


ACCAGCCCTG GGTTGAAGGG 


GAGCGAGGGA 


t 

GACACCTTTT 


4380 


ACTTAAACCC 


CTGAGCTTGG 


TCAGAGAGGC TGAATGTCTA 


AAATGAGGAA 


GAAAAGGTTT 


4440 


TTCACCTGGA 


AACGCTTGAG 


GGCTGAGTCT TCTGCCCTTC 


TGACTCCCCC 


AGCAAATACA 


4500 


GACAGGTCAC 


CAACTACTGG 


AGATGAGAAA GTGCCATTTT 


TGGCACACTC 


TGGTGGGGTA 


4560 


GGTGCCCGAC 


CGCGTGTGAA 


AAAGTGGGAA GGAGAGATTT 


CTGCGCACGC 


GGTTCAGCCC 


4620 


CCAGGCGCGG 


TGGCGCATTC 


AGGTACTCAG ACGCGGTTCT 


GCTGTTCTGC 


TGAGAAACAG 


4680 


GCTTCGGGTA 


GGGGCTCCTA 


GCTCCGCCAG ATCGCGGAGG 


GACCCCCAGC 


CCTCCTGCGC 


4740 


TGCAGCGGTG 


GGGATAGCGT 


CTCTCCGTAG GCCTAGAATC 


TGCAACCCGC 


CCCGGGTCCT 


4800 


CCCCGTGTCC 


TTCCCGGGCG 


TCCCGCCGGG GATCCCACAG 


TTGGCAGCTC 


TTCCTCAAAT 


4860 


TCTTTCCCTT 


AAAAATAGGA 


TTTGACACCC CACTCTCCTT 


AAAAAAAAAA 


AATAAGAAAA 


4920 


AAAGGTTAGG 


TTATGTCAAC 


AGAGGTGAAG TGGATAATTG 


AGGAAACGAT 


TCTGAGATGA 


4980 


GGCCAAGAAA 


ACAACGCTCG 


TGCAAAGCCC AGGTTTTTGG 


GAAAGCAGCG 


AGTATCCTCC 


5040 


TCGGCTTTTG 


CGTTATGGAC 


CCCACGCAGT TTTTGCGTCA 


AAGCGCATTG 


GTTTTCGAGG 


5100 


GCCCCCTTTC 


CACCGCGGGA 


TGCACGAAGG GGTTCGCCAC 


GTTGCGCAAA 


ACCTCCCCGG 


5160 


CCTCAGCCCT 


GTGCCCTCCG 


CTCCCCACGC AGGGATTTAT 


GAATGCAAAG 


AGAAGCGCGA 


5220 
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GGACGTGAAG TCGGAGGACG AGGACGGGCA 
CAACTTCACG CTGGAGCAGC TGAACGAGCT 
CGACGCCTTC ATGCGCGAGG AGCTCAGCCA 
GGTAGGAACC CGGGGGCGGG GGCGGGGGGC 
ACAGCACGCG TACAGCCACC TGCGCCCGGG 
AGGTTGGGTG AGGGACGGGC TGGGGTTCCT 
ATGGGTTCAT TGCGTTTGTT TTTCACCAAC 
ACAAATAACA AATAAATATA TATGTTATAC 
TTGTTCGTCC TTGGTGCAAA GACACCCGGT 
GGTTCCCCTG GGATTGGTTA TAGGGGCAAC 
CTTAGGAGAC GAAGCTACAG ATGCGTTTGA 
AAAAAAAATG TGTCTTTTGG CCCCTGATTC 
AAGGTTTCCT TAGGATGAAA GGAGAGGGGT 
GTCTTCTCTT TCCTCCGTTT TTTCACCTAC 
CCTGTCCTCT GCTACAAACC ACCCCCTCCT 
TTGGGCATCT GGATGAGCGG AGACTATTAG 
GAATTCACGC TGCCCCATGA GACCAGGCAC 
GAGGGACGGG CGGGCAGAGC CTTCCTCCGC 
TATTTCAGGG GTGCTGCTGG GAGAGCCTCC 
TTCAGCTCCC CTGGAAGGTC AACTCCTCTA 
AGTGGGGGGA GGGAGAGAGA GAGAGAGAGA 
GGGAACCCGT CAAAAATAAA TGAAATTAAG 
GCAAACGGCG TTCAAAGCAA AGAGACGAAC 
AGGTCAACAC ATTCAAACAC AGCTTGCTCT 
GCATTTGATC CAAAGTGTGT TACATCTTTC 
TATAAATATA TAAACATACA TAAATGTATG 
ATATATAAAC ACATATATAA TATATAAATC 
ATATAAACAT ATATAATATA TAAATATATT 
AAACATATAA ACATATATAA ATATATAAAC 
ACAAACATAT TGTATATATA TAAATATATA 
TATAAACATA TATACATATA AAGAAATATA 
TATAAACATA TATATACATA AAATATATAT 
ATTAACATAT ATATACATAT AAAAATATAT 
ATATATATTT TTGGCCCCTG ATTCCCTTCG 



GACCAAGCTG AAACAGAGGC GCAGCCGCAC 
CGAGCGACTC TTCGACGAGA CCCATTACCC 
GCGCCTGGGG CTCTCCGAGG CGCGCGTGCA 
CCGGAGCCAT CGCCTGGTCC TCGGGAGCGC 
CCGCCGCCGT CCCCTTCCCG GAGCGCGGGG 
GGACTTTTGG AGACGCCTGA GGCCTGTAGG 
AGCAAACAAA TATATATACA TATATATTAT 
AGATGGGTAT ATTGTATATA TTATAGATAT 
GAACCCATAT ATTGGCTCCT GACTGCCTTC 
ACATGCAAAC AAAACTTTCC CTGGATTATA 
TCCAGAGTGT TTTACAAGAT TTTTCATTTA 
CCCTCCGTCT TCCCGTGTGG CTGCATTGAA 
GTCCTCTGTC CCTAGGTGGA GAGAAACAGG 
CGTTTCTATC TCCCTCCTCC CCTCTCCAGC 
CCCTCCGGCT GTGGGGAGCG CAGGAGCACG 
CGGGGCACGG GGGCTCCCCG AGGAGCGCGC 
QGGGGGGCGG AGGGGCCTTG GGTGTCCGCA 
ATTCTAAACA TTCACTTAAA GGTATGAGTT 
AAATGGCTTC TTCCAGCCCC TGCCTGACAG 
GTCCTTTCTC CTGGTTCTGG GCAGGACAGA 
GAGAGAGACG GTCAGGATCC CCGGACCCTG 
ATTGCCGACC AGAGAGAGAA CCGTGACAAA 
TGAAAGCCCG TTCCCGTAGG ACTGGTTATG 
GGATTTTGCT GAGCAGAGGA AGATACAGAT 
ATTATATGTG TGTCTATATA TATAAACATA 
TAAATATATA TAATCTATAT ACATATATAA 
TATAAACATA TATAATATAT AAACATAAAT 
AACATATATA AAATATGTAT AAATATATAT 
ATATAAATAT ATAAACATAT ATAAATATAT 
TAAAAACATA TATATACATA TAAAAATATA 
TATAAACATA TATACATATA AATATACATA 
AAACATATAT ACATATAAAA ATATATATAT 
ATATTAACAT ATATATACAT ATAAAAATAT 
GTTCCTGTGG GATGGGTGAT TGAGTCAACA 
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CATTCAAACA 


CAACTTTTCC 


ATCGATGTTG 


CTTAGGAGAT 


GAGGATACAG ATGCGTTTGA 


7320 


TGGAGAGGGT 


TTTACAAGCT 


CTTTCATTTA 


AATATATATA 


TATATATATA TATATTTTTT 


7380 


GGCTCCTGAT 


TCTCTTCCGT 


CTTCCCATGT 


GGCTGCATTT 


TAAAAGGCTT 


CCCTAAGATC 


7440 


GTTACGATTA 


AATCAACCCT 


CCCCAGGCAT 


CTTTACCGAG 


GGCTGTGGTC 


CCCAAAGCGA 


7500 


TACAGCCCAG 


GAGGGAGAGA GGCTTTGGTG ACTTGGAGGA AGGACTGTGT CCCTCCTTAG 


7560 


GGCGTCTGTG 


GCCTCAGTGA GGGAAGGAAG 


CTGCATCAGA 


CAGGGGTTTC 


CTCGCTGTCC 


7620 


ACCCCTCTGG 


CAGAAGATGG ATTGGGCTGC 


CCCGTATAAA TTAATGAAAA GATTAAAGTT 


7680 


TCGCTAAAGG 


GGACATCGAG TTTATGTGTC 


ATCTCCTGGT 


GTCTGTGTGC 


CTGGGATCTG 


7740 


CAATATATCC 


CAGCCCTTGA TGTACTGTTT CTATAAAAAT AAATTACTTG TAATTTAATT 


7800 


CCACACTATT 


TCTTTCCGTA GTCTATTACC 


GACGAGAGCA 


CGTTAGTTCA GCTGCGGAAA 


7860 


ATTGGTTGTG 


GGGTGTGTGC 


GGACCCCGAG 


AACGCCCTAA 


AATAAAGACA AATCGGGGAC 


7920 


AAGCTGGGGG 


TTATCGATTG 


CAGGGGTCGC 


ATGAAAATTT 


AACGACGGTA AATAATAATA 


7980 


AAAACAAACA 


TGGGAATGCA ATAAAAGACA 


TAATTCTCCA 


TCGCCGCGGG 


GGGAAAGGAT 


8040 


CCTATAGTAA 


AGGCGAGTGC 


GCTTTGAGGG 


GTCATAAAAA 


TCAATTAGTT 


CCAACACCCA 


8100 


CGTCCCGCGT 


TGAGGGGACG 


GGGACGAGCA 


GGGACAGAAA AAGAAACCAT 


ATTTGAATCC 


8160 


CATCTCTCTG 


TGAATTCTTG 


GGTCACATGC 


GTCTCAGTAC 


AGCCCGTCCC 


GTGCTGTGAC 


8220 


CGGATAGAGT 


TTCAATTTAC 


TGTGGAAATT 


TGCTGTAAAT 


AAATTGAGCA 


TCCGATAGAA 


8280 


GCTGTTGCTG 


ATTAACCTTT 


TATTTTTAGC 


GTGGCCCTGC 


AAAGTCGTAT 


CACCCAGCTG 


8340 


TCAGGCTTCT 


AATCGAAAGT 


TATGAGACCA 


CGGTGAGGGG 


CAGGCGGTAA 


TTTAATTACA 


8400 


ACAAATATCT 


TTGGGTTTAT 


GGCGCAGAGC 


TAAATTAAAT 


GTCATTATTC 


ACTGTCTGTA 


8460 


ATGGAAATCA 


AAAGGAAATC 


GCATTACGGC 


ATTTGGGAAA 


GAAAGCGGGG 


AGTGCTCTTT 


8520 


AATGAAGAAA 


TAACTGTCTT 


AAGCAGTGTC 


ACACACTTCA 


CTTACCATAT 


TCGGGCCTAA 


8580 


TTGGAATGGA 


TCGTGAATCA 


CTCCAAGACT 


GATTTATTAG 


CGCTTCACGC 


AGCGGCTAAT 


8640 


TCATCACTTG 


TATTCTTCAT 


CATTTTTTTT 


TTTCCTCTCG 


CCGTGTTGAA 


GGGAGAGTGA 


8700 


ATGAGGCTTT 


CCACGTTTCA 


GGAGGATTTT 


CTTTTTTGAA 


AAATGCCCTT 


CCAGAGGCTT 


8760 


TTGGGTGGCT 


GGCTTGCTTT 


CTGGGCCCTG 


GAGGAGACAG 


GCGGAGAGTC 


CAGGTGGGCA 


8820 


TGGAGAGGCA 


CAGTGGCAGG 


TCACCTGGAT 


GGTCAGTGGA 


GGTGGAGGTC 


TGAAGGCGCC 


8860 


AGCTTTGGAA 


ATTATTGGTG 


AATTTCGATG 


TCAGCACCAG 


GCAGGGGCCT 


TTTTGGCGGG 


8940 


GGTGTGAGGG 


AGGATGACTT TGCTGGGAAA CAGGATCAGG 


TTCTCCAGGC 


GCACTGCAGC 


9000 


CCGGTAGGAC 


CCACTTTGGA AATGAAAAGC 


CAGTTCCGAA 


AGCTGGGCTG 


GAAGCTTCCG 


9060 


TGTTGGGTTC 


AAGAGCAAGT TCACGTTGCG 


CTGTGTAGAC 


TCCTGGCTGC 


TCCCAAACTC 


9120 


TGAGGGTTTT 


CTGAGGTTCC 


CTTCATAGGG 


GCACCGGCCC 


TGGGCCATGC 


ACAGTGCGTA 


9180 


AGGGTGGCTG 


TGGGCCGAGG GACCCAGCAC 


GTGTTTTGCC 


CACAACAGCC 


GGAGTGACTG 


9240 
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GTTCACTCAC CGCCTTGGCG GAGGACGCCT GTTCTCTGGA CGAATCATTT CTCTTGGGTG 9300 

GTGACTGCCT TGTGGGTCAA GGTGCAGGTT TTCTGCCACA GAAAACCTGT TAGGAGGAAT 9360 

TAAGCGACTA AGACTGTCAG GGAGGTGGTG GTGGGGGAGA GGAGGGGGTG GTGTCCAGAT 9420 

TACCAGGCAT AGGCTAAACT GCCTGCACTC TCCAGCTGGT CTGTCTGTGG AGGAGGGGAT 9480 

TGTCAATACT GGGAGAGCAG AGGAGGCTCG TAGGAGGTGA GAGGGGGTGG AATTTGCATG 9540 

CAAATCTTCA CATGAGGCCT GTGTGAATTT CTCCAGCCTC CTGAGGGTCC CCTGCGCTAT 9600 

TGCACTCAAC TTCTTGATAG TTTACCCCAA GACTCAGAAG TCCTTAGAGG GGCAGAATGC 9660 

CCCCACCACA AAGCCTGCTA TCCTTGGGCG TCCTCAGGAC CCTTGGTCAT GAATGGGACC 9720 

CTTTCATGTA TGGGGACCCT TGGTAATATG AATGGGACGC CTTCAGCTCC CCAGGGCTTC 9780 

CGAGGAGGCC GAGAAGGGCA AAGACACTTC CGAGGAGGCC GAGAAGGGCA AAGACATTTT 9840 

CTGGGCTTGG TGTGTCAAGA GCTAGATTGG AGAAGGGGCT GGATTTGGAA CTCTTTAGCC 9900 

ATCAGCTCAC CCTCTCCGTT TGTGGCTAAA GTCTGAAGGT GGAAACTTCG GTTCTCCTAC 9960 

AGGGTCTACA GGAGTTGGGG GGCGGGGCGC CCACACAGAA CGCTGGAAAG TTCGACAGTC 10020 

CACTTCCACT GGCTCGGAAC TCACTTTTTC ACCTTAAGTT CATCAGCGGT AACGCATAGG 10080 

TCTCACTTAG GCAGGGCACG GATGATTTAA CAATTTCTAC TTCTAGGTCA GGTGCGGTGG 10140 

CTCACACCTC TAATCCCAGC ACTTTGGGAG GCCCAGGAGG GTGGATCGCT TGAGGTCAGG 10200 

AGTTTGAGAC CAGCCTGGCC AACATGGTGA AACCCCGTCT CTACTAAAAT ACGAAAATTA 10260 

GCCAGGCATG GTGGTGAGCA CCTGTAATTC CAGCTACTCG GGAGGCTGAG GCAGGAGAAT 10320 

CGCTTGAACC TGGGAGGTGG ACGTTGCAGT GAGGTGAGAT CACACCACTG CACTCCAGCC 10380 

TGGATGAGAG AGCAAGACTC TGTCTCAAAA ACAAAATAAA ACAAAAACAA AACAAAAATC 10440 

AAAAAAGAAA ACCCAATTTC CAGTTCTAGG CCAGGTGCAG TGGCTCACGC CTGTCATCCC 10500 

AGCACTTTGG GAGGCCCAGG AGGGTGGATC GCTTGAGGTC AGGAGTTCGA GACCAGCCTG 10560 

GCCAACATGG TGAAACCCCA TCTCTACTAA AAATACAAAC GTTAGCTGGG TGTGGTGGTG 10620 

TGCGCCTGTA ATCCCAGCTA CTCGGGAAGC TGAGGCTGGA GAATTGCTTG AATCTGGGAG 10680 

GTGGAGGTTG CAGGGAGGCG AGATAGTGCC ACTGCAGTCC AGCCTGGACC AGAGAGCAAG 10740 

ACTCCGTCTC AAAAACAAAA GAAAGCAAAA ACAAAAAACA AGAGACCAGC CTGGCCAACA 10800 

TGGTGAAACC GCGTCTCTAC TAAAATACAA AATTAGCCGG GCATGGTGGT GGGCACCTGT 10860 

AGTCCCAGCT ACTCGGGAGG CTGAGGCAGG AGAATGGCTT GAACCTGGGA GGTGGAGCTT 10920 

GCAGTGAGCC GAGATAGTGC CACTGCACTC CAGCCTGGGC GACAGAGCGA GACTTGATTT 10980 

CAGAACCACC ACCACCACAA CAAAACAAAA CAAAAAATCC AAAAAAACCC CAATTTCCAG 11040 

TACTAGGTAG TCAGTGATGC AGGGCTGGAG ACAGAGGGGC GGTAAGTGTC TGGGCGCCCA 11100 

CCATCAGTCA CCTCCCAGCT CCCAGAGGTG CAAAGTGCTT GGTTCAGCCT CATGGGAAGG 11160 

ATGCTCCCTG GGGAGGCTGG GCTGGGTTCA CAGGGCTCTT CACATCTCTC TCTGCTTCTC 11220 

CCCAAGGTTT GGTTCCAGAA CCGGAGAGCC AAGTGCCGCA AACAAGAGAA TCAGATGCAT 11280 
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AAAGGTGGGT 


GTCGGGACTG 


GGGGGACCTG 


AAGCTGGGGG 


ATCCTGCTCC 


AGGAGGGATG 


11340 




GGGTCGACGA 


GGTGCTGGCT 


ACACCCAGGA 


CCACCACACT 


GACACCTGCT 


CCCTTTGGAC 


11400 




ACAGGCGTCA 


TCTTGGGCAC 


AGCCAACCAC 


CTAGACGCCT 


GCCGAGTGGC 


ACCCTACGTC 


11460 




AACATGGGAG 


CCTTACGGAT 


GCCTTTCCAA 


CAGGTAGCTC 


ACTTTTTCTT 


CCTCTGAAGA 


11520 




TCCCTAGGGA 


CCTGCTGCTC 


CCTTCCCCTT 


TCCCCTATTT 


GCTGCCGCAT 


CCTGACACTC 


11580 


10 


CTAGTCCCTC 


CCTGCCCCTG 


CAGACTTCTC 


AGCTGGCCCT 


TAGAAAAAAA 


GCCTCTTTTC 


11640 




CGAGGAGGCA 


TTTACAGGCA 


CCTTGGCACC 


TATGAAATCA 


GGCTGGGCCA 


GGCGGGGTGG 


11700 




CTCACACCTG 


TCATCCCAGC 


ACTTTGGGAG 


GCTGAGGAGG 


GTGCATCACC 


TGAGATCAGG 


11760 


15 


AGTTCAAGAC 


CAGCCTGGCC 


AACTTAACGA 


AACCCCGTCT 


ATTAAAAATA 


CAAAATGGGT 


11820 




GTGGTGGCTC 


ACGCCTGTCA 


TCCCAGCACT 


TTGGGAGGCC 


GAGGCAGGTG 


GATCACCTGA 


11880 




GGTCAGGAAT 


TCGAGACCAG 


CCTGACCAAC 


ATGCTGAAAC 


CCCGTCTCTA 


CTGAAAACAC 


11940 


20 


AAAGCTTAGC 


CGGGCGTGGT 


GGTGCACACC 


TGTGATCCCA 


GGTACTTGGG 


AGGGAGAATC 


12000 




ACTTGAACCT 


GGGAGGTGGA 


GGTTGCCGTG 


AGCCAATATC 


GCGCCACTGC 


ACTCCACTCT 


12060 




GGGTGACAGA 


GTGAGACTCC 


AAGAC TCCAT 


CTCAAAAAAA 


AAAAAAAAAA 


TCAGGCTGTA 


12120 


25 


AAAATCCACT 


TTTGGGAAGG 


TGAACACACA 


CAAGCCCAAA 


CAGAAATCTG 


ACAAAAACCA 


12180 




GAGGGGTGAA 


AAGTCCACAC 


AGTCAGGCAC 


CCCCACCTGG 


CTTGCTGCCT 


GGTTAAGAAG 


12240 




GGCGCAGATG 


CCTGTGCCTG 


GATACCAGAG 


ATGGGACAGA 


CACCCATTCC 


CTTTTCATCA 


12300 


30 


CCACCCCCGA 


GTGCCCGAGG 


GCCTGGGGCG 


TCTGCCTGGC 


CCCTGGCCCC 


TGGCTTGGGC 


12360 




TCTGCACCTC 


TGAACTGGAG 


ACACCCTACT 


CAGCTCCCCA 


CTTACTTTGG 


AGTGAGCAGC 


12420 




GCTTGGGTGC 


CCAGCGTGGA 


TTTGGGGCTT 


CCAGGGAGTC 


GGGGTTCGGT 


CGCGGAGCCC 


12480 


35 


AAGCTTCCCA 


AGGGCGCCCC 


CGCCCTGCCC 


TGGCTTAGTG 


GTGGGGATGG 


GATGGGGGGA 


12540 




AACGGGGAGC 


TGCGTGGAAG 


GAGGTGAAGG 


GTCACAGGAG 


GAGAGAGCGC 


AGCGCCCACG 


12600 




TGCGCCCTGC 


CTGAACGCGC 


AGCGCAGCGC 


CCGGCTGCGG 


TGCCCCTTGC 


CCCTTCGGTC 


12660 


40 


CCTAATTTGG 


GGATCGGGAG 


TGCATGCGCG 


GGCGGAACGG 


GCTTGGGGGG 


GGGGCTCTGG 


12720 




CAGGGCGGAC 


GCGTGGCCTC 


CCTTCTTCAC 


CGTTTTATTC 


CAAGGGGACA 


GGCTGGGGAT 


12780 




TGTATTTGGG 


CGCGTGTTTG 


GCTGAGGGTG 


CAGGGACTTG 


GGGGGTGGCG 


GTGGGGAGCG 


12840 


45 


CGGAAGGTAT 


AAACGTATAA 


ATCATAAGTA 


AACAACTCAG 


AAATGGACCC 


CGAGCGCTGG 


12900 




TCGCCGCTAG 


CTCTCCAGCT 


CTCCCTGGCC 


CAGGCCCGAA 


GGAGAGGGGT 


CCGCATCCCT 


12960 




CCGCGGTTCT 


CCTCTCCTGG 


GTACCTGGCC 


TTGAGGTGGG 


GGAACGAGCC 


TACTTCTTGT 


13020 


50 


ACCGTCTTTT 


GCCGACGGCG 


GGACCCAGTG 


AAATTAGGCC 


GTTGGAGCCC 


GCAGGCCTGC 


13080 




CTGGCTTTGC 


GCACCGGAGT 


CTTGGGGACC 


TGGTGTCCCC 


GGGAAAAACT 


TGGGGACCTG 


13140 




GTATCCCCGG 


GAGAGGCTTG 


GGGACCTGGT 


GTCCCGGGAG 


AGGCTTGGGT 


ACCTGGTTTC 


13200 




TCTGGAAGAG 


GCTTGGACAC 


CTGGTGTCCT 


GGGAGGGCCT 


TTGGGACCTG 


GTGTCCTGGG 


13260 
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AGAGGCTTGG AGATCTGTTG TCCTGGGAGA GGCTTGGGGA CCTGGTGTCC CTGGAGAGGC 13320 

TTGGGGACCT GGTGACCTTG GAGAGGCTTG GAGACCTGGT GTTCTGGGAG AGGCTTGGGG 13380 

ACCTGGTGTT CTGGGAGAGG CTTGGGGACC TGGTGTCTCT GGAAGAGGCT TGGACACCTG 13440 

GTGACCCGGG AGGGCCTTGG GGATCTGGTG TCCCGGGAGA GCCTTGGGGA CCTGGTGTCC 13500 

TGGGAGAGGC TTGGGGACCT GGTGACCTTG GAGAGGCTTG GGGACCTGGT GTCCTGAGAG 13560 

AGCCTTGGGG ATCTGGTGTC CCAGGAGAGG CTTGGGGACC TGGTGTCTCT GGAAGAGGCT 13620 

TGGACACCTG GTGTCCTGGG GAGAGGCTTG GGGACCTGGT GTCCTGGGAG AGGCTTGGGG 13680 

ACCTGGTGTC CTGGGAGAGG CTTGGAGATC TGGTGAGCCG GGAGAGGCTT GGGGACCTGG 13740 

TGTCCCGGGA GAGGCTTGGG GACTTGGTGT CCCGGGAGAG GCTTGAACAC CTGGTGTCCC 13800 

AGGAGAGGCT TGGGGACCTG GTGACCTTGG AGAGGCCTGG GGACCTGGTG ACCCGGGAGA 13860 

GCCTTGGGGA CCTGGTGTCC TGGGGAGAGC CTTGGGGACC TGGTGACCTT GGAGAGGCTT 13920 

GGGGACCTGG TGTCTCGGGA GTGCCTTGGG GACCTAGTGA CCCGGGAGAG GCTTGGGGAC 13980 

CTGGTGTCCC GGGAGAGGCT TGGGGACCTG GTGTCCTGGG AGAGCCTTGG GGATCTGGTG 14040 

TCCTGGGGAG AGGCTGGGGG ACCTGGTGTC TCGGGAGAGA GCCTTGGGGA CCTGGTGACC 14100 

CGGGAGAGGC TTGGACACCT GGTGTCCCGG GAGAGGCTTG GGGACCTGGT GACCCGGGAG 14160 

AGCCTTGGGG ACCTGGTGTC CTGGGGAGAG GCTGGGGGAC CTGGTGTCTC GGGAGAGAGC 14220 

CTTGGGGACC TGGTGACCCG GGAGAGGCTT GGACACCTGG TGTCCCGGGA GAGGCTTGGG 14280 

AGCCTGGTGT CCCGGGAGAG CCTTGGGGAC CAGGTGACCT TGGAGAGGCT TGGGGACCTG 14340 

GTGATCTTGG AGAGGCTTGG GGACCTGGTG TCTCGGGAGA GGTTACGGGG GCTGGTTGGG 14400 

GGAGAGAACG TTGTGAGCCA AAGTCCCTGA ATCCCTGCGA AAAGAGCGCA TCGGGAGCTC 14460 

CCCCTGAGGG CGTTCCATTT GTGGACCCCC CTCCCATGCG CTTTGCAGGG AGCTGTTCGG 14520 

ATTCCCCTGG CCCGGCTCCC GCGGATGCAT CCAGTGGCAG CGCCAATTCT GGGCCAGGGG 14580 

GAAGGAGGAA AGGCGGGTGT GGGGTGGTCT CCACGGCTGG AGAAGGGGCG ACGCTCCCTA 14640 

GGGGAGAAGA GGCACGTTGG GGGTTTCCGG GGGCGCGGGG CGGAGCAGGC CCCCCAGTCC 14700 

CCATCCTGCG CCCTCACCCC GCCGGGTCCG CTCCCGCAGG TCCAGGCTCA GCTGCAGCTG 14760 

GAAGGCGTGG CCCACGCGCA CCCGCACCTG CACCCGCACC TGGCGGCGCA CGCGCCCTAC 14820 

CTGATGTTCC CCCCGCCGCC CTTCGGGCTG CCCATCGCGT CGCTGGCCGA GTCCGCCTCG 14880 

GCCGCCGCCG TGGTCGCCGC CGCCGCCAAA AGCAACAGCA AGAATTCCAG CATCGCCGAC 14940 

CTGCGGCTCA AGGCGCGGAA GCACGCGGAG GCCCTGGGGC TCTGACCCGC CGCGCAGCCC 15000 

CCCGCGCGCC CGGACTCCCG GGCTCCGCGC ACCCCGCCTG CACCGCGCGT CCTGCACTCA 15060 

ACCCCGCCTG GAGCTCCTTC CGCGGCCACC GTGCTCCGGG CACCCCGGGA GCTCCTGCAA 15120 

GAGGCCTGAG GAGGGAGGCT CCCGGGACCG TCCACGCACG ACCCAGCCAG ACCCTCGCGG 15180 

AGATGGTGCA GAAGGCGGAG CGGGTGAGCG GCCGTGCGTC CAGCCCGGGC CTCTCCAAGG 15240 

CTGCCCGTGC GTCCTGGGAC CCTGGAGAAG GGTAAACCCC CGCCTGGCTG CGTCTTCCTC 15300 
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TGCTATACCC TATGCATGCG GTTAACTACA CACGTTTGGA AGATCCTTAG AGTCTATTGA 15360 

AACTGCAAAG ATCCCGGAGC TGGTCTCCGA TGAAAATGCC ATTTCTTCGT TGCCAACGAT 15420 

TTTCTTTACT ACCATGCTCC TTCCTTCATC CCGAGAGGCT GCGGAACGGG TGTGGATTTG 15480 

AATGTGGACT TCGGAATCCC AGGAGGCAGG GGCCGGGCTC TCCTCCACCG CTCCCCCGGA 15540 

GCCTCCCAGG CAGCAATAAG GAAATAGTTC TCTGGCTGAG GCTGAGGACG TGAACCGCGG 15600 

GCTTTGGAAA GGGAGGGGAG GGAGACCCGA ACCTCCCACG TTGGGACTCC CACGTTCCGG 15660 

GGACCTGAAT GAGGACCGAC TTTATAACTT TTCCAGTGTT TGATTCCCAA ATTGGGTCTG 15720 

GTTTTGTTTT GGATTGGTAT TTTTTTTTTT TTTTTTTTTT GCTGTGTTAC AGGATTCAGA 15780 

CGCAAAAGAC TTGCATAAGA GACGGACGCG TGGTTGCAAG GTGTCATACT GATATGCAGC 15840 

ATTAACTTTA CTGACATGGA GTGAAGTGCA ATATTATAAA TATTATAGAT TAAAAAAAAA 15900 

ATAGCCGTGC ACTCTTGACC CCGTCAACGT CCAACGTGGA AAAGGCGTTA CCTCTTCTCC 15960 

CAGCGCTGGC CGCCTGGCCA CTGAGGGCCC TTTGCAAAAA TCACGGGTGT AGAGATGGCC 16020 

CTGGGCGCGC TGGGAGTGTG GTTGTGTTTC TGAAGGGGAT AAAAGAGGGC ACGGTGGTGC 16080 

CAAGATATCA GTTTGGTACC TGAGCTGTTT CTGGTTGGGA AGCGTAAAAG CCAGGGAGAG 16140 

ATCCAGAGAG TTTTCAAGTT TTTGCAGATG TAGGTGGTTC CAGCTTTTCT TTCTCCCCTA 16200 

CTCCATCTTC TGCGTTCCCC CAGTTCTTTT ATTTCTTTGT TTTTTATTTT TGAGACAGAG 16260 

ACTTGCTTTG TCGCCCAGGC TGGAGTGCAG TGGCGCAATG TCAGCTCACT GCCACCTCCA 16320 

CCTCCCGGGT TCAAGCGATG CTCCTGCCTC AGCCTCCCGA GTAGCTGGGA CTACAGGCAC 16380 

CTGCCACCAC CCCCGGCTAA TTTTTTGTAT TTATAGTAGA GACGGGGTTT CACCGTGTTG 16440 

GCCAGGCTCG TCTCGAACTC CTGACCTCAG GTGATCTGCC CGCCTCGGCC TCCCAACGTG 16500 

CCCCCAGTTT TATAAACAGC AGATAGCAAC TTGTCGTCAC AGCTGGCATG GGCTGGACAG 16560 

TTGCTTGAAA TGACCTAACC AAAAACATTC AAGGGTTCTG CCCCCAGATT TCGGGAGATC 16620 

CACGTTCCAT GTTCTGATTG GTTTTCTGGG AACACAGCAA GGGGTTTGGT GACCTCCGAG 16680 

AAGATCCATC TGCATGATTG GCATTAGTTA CCACAGCCTG CCCAGAGAGA AACTATCTTC 16740 

TCCCAACATT TACTAACATC CACTCGTCAA CTCTCTTATT TCCATAACAC ATTTGCATCT 16800 

TTCTGGATTC AAGCTTGGTG GTTTTCTTTC CTAACTTCTG ATTTAGATAC TTCTCCCTGA 16860 

GGTGGGGATA AAAGAAAAAA AAAAAACAAC TTCTTTTTTT CTTCCGCATA ACACTTTCTA 16920 

TCTTGTCACT GAGCTGAACT GTAGATCCAT TTGGACCCGT CTCATTTGTA TCTTCTGATA 16980 

TTCTTTATAC AAACCAAAAG TCCCCTTCAA CATTTTTTAT GTCAAAATGT TACAACCGCT 17040 

GTAAAATGAC GGAGAGAGAG AGAAAGAATC CCAGACATTA ACGGTATTAG AGAGTTTGCC 17100 

TCATTCATCC ATTTTTCTTA AAAGCTGGAA ATTAAAAAAA AAAAAGAGAG AGAGAGGCTT 17160 

TAATAGTTAA GCTGAAATTT TTATCGAAAA GAAGAATTGC ATTTTGAATC TTTGGGAAGT 17220 

AGGTTCATTC ATCAGAGTAT GTAACCCTTT GGAAAAGTGG TTGGTAAGAT ATGTACAGCC 17280 
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CTAGATTTTT TTTTTTTTAA CCAAAAAGGC TGAGTAATTT TGAAAAATCG AAACATAACA 17340 

GTGTGTCATC ATTTCCTCCC AAGAAAAAGC TCACTCCACG TGAGTAGAAA GACATCTACC 17400 

TGGTCCCTGT AGAATCTGAA CGTTTCTCTT TAGAGACGGA ATTTCAATCT TGTTGCCCAG 17460 

GCTGGAGTGC AGTGGCACAA TCTCGGCTCA CCGCAACCTC CGCCTCCCGG GTTCAAGCCA 17520 

TTCTCCTGCC TCAGTCTCCC GAGTAGCTGG GATTACAGGC ACCTGCCACC AGGCCTGGGT 17580 

AACTTTCTGG TATTTTTAGT AGAGACAGGG TTTCAGCCTC CCGAGTAGCT GGGATTACAG 17640 

GCACCTGCCA CCAGGCCTGG GTAACTTTCT GGTATTTTTA GTAGAGACAG GGTTTCAGCC 17700 

TCCCGAGTAG CTGGGATTAC AGGCACCTGC CACCAGGCCT GGGTAACTTT CTGGTAGTTT 17760 

TAGTAGAGAC AGGGTTTCGG CCTCCCGAGT AGCTGGGATT ACAGGCACCT GCCACCAGGC 17820 

CTGGGTAACT TTCTGGTATT TTTAGTAGAG ACAGGGTTTC GGCCTCCCGA GTAGCTGGGA 17880 

TTACAGGCAC CTGCCACCAG GCCTGGGTAA CTTTCTGGTA TTTTTAGTAC AGACAGGGTT 17940 

TCGGCCTCCT GAGTAGCTGG GATTACAGGC ACCTGCCACC AGGCCTGGGT AACTTTCTGG 18000 

TAGTTTTAGT AGAGACAGGG TTTCAGCCTC CCGAGTAGCT GGGATTACAG GCACCTGCCA 18060 

CCAGGCCTGG GTAATTTTTT TGCATTTTTG GTAGAGACAG GTTTTTGCCG TGTTGGCCCG 18120 

GCTGGTCTCA AACTCCTGAC CTCAGGTTGA CCTGCCCGCT TTGTCCCTCG CAAAGTGCTG 18180 

GGATTACAGG CGTGAGCCAC CACACCTGGC CTGAATCTGA ACTTTTAAAA GGGAGTTACT 18240 

GACTCTCAAC TGTGCGGGGA CGG TTTCACT TTGATTTAAT ATGGAAAGAG GGCCAAGTGT 18300 

CATCCTCACA AATGGGTCCC CGAAGCAGAT CAAACGCAGA GAACTGTGAG GGTGGGACAC 18360 

GAGTGTCTGT GGACACTGGC TGCCTTTGGC TTTTCTCCTG CGAGAGAAGT TGGGTGACTT 18420 

TCTGTAGGTG GATGAGTGAT CCCTGAATGA GTGTGGGGTA CGTGTATGCT AGCTGCTTCT 18480 

TTCTCCCTGA AACTCTCGGA TGGAAGGAAG TAAGAAATTC AGCTTGGGCT GTGACCAGTT 18540 

CTCACCACCA ACGCCCTCTT CTCTCTCCCT TCTCCTTCCT TCCTTCCTTC CTTCCTTTCT 18600 

TTCTTTTTCT TTCTTTCTCT CTTTCTTTCT TTTCTTTCTT TCTGTTTCTT TCCTTTTTAT 18660 

CTTTCTCTCT TTTTCTTTCT CTTTTCCTTT TTTGTTTCTT TCTTTCTTTT TCTTTCTTTC 18720 

TTTTTCTTTC TTCTTTCTTT CTTCGATGAA GTCTCACTCT GTCACCCAGG CTGGAGTGCA 18780 

GTGGTGCAAT CCCAGCTCAC TGCATCCTCT ACCTCCTGGC TTCAAGAAAT TCTCCTGCCT 18840 

CAGCCTCCCA AGTAGCTGGG ATGACAGGCA CCCACCACCA TTCCCGGATA ATTTTTGTAT 18900 

TTTTTAGTAG AGACTGGGTT TCGCCATGTT GGCCAGGCTG GTCTTGAACT CCTGACCTCA 18960 

CATGATCCAC CCGCCTCAGC CTCCCAGAGT GCTGGGATTA CGGGGTGAGG CACCGCGCCC 19020 

GGCCTCCTCT CTCTTTTTCT GAGATGTTTA GGAAGGACTG GGCTGATGGG GACCCTCTGT 19080 

ATGTGATGTG CGTGGGTTTG GTTTCCCGGA AGGCCCTCCA GAGACACGTT TGCGTGAACA 19140 

TTCAGCATGG AAACAACATA CGTCTCTCCA CAGGAGGTGA GAAATTGAAT TTATGGGGTG 19200 

GGTGTACGCT GGCGATTCTT GGTGCTTTTT GCTCAAAACA AGGTTCTTTT GAAAGTCACG 19260 

TTCCTGCTTT CCCTGTGGCT TCCCGGTGAG CTCGCTCGCA GAGCAAGGAA TACCACCCAG 19320 
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AGAGCAACGT 


GGGCTGTGTT 


CCGTTGTAAC GCCGTTGCAG 


AGAGAGGATT 


TGGTGTGTGA 


19380 




GATCCGTACC 


AGCTCCAGCA 


CACTGATAGG AACACGTTGC 


TGGCCGAACT 


GAACGATGCT 


19440 


5 


GGGTTGGGTC 


CTGATTGATA 


CGTATTTTCT TCCCTCCTCT 


CCCCAAAACT 


TGGCCAAATA 


19500 




GTCCGTGGAG 


GGTTGTCAGT 


CGCCGCAGTT 


GAGCAAAAAA 


CACTTCTTCC 


TTTGAGTGGC 


19560 


10 


TGTTCTGGTG 


AAATCTGTTT 


CTGACATATC 


CACTTTTCTC 


TCTCTTTTCT 


CTCTCTCTGA 


19620 


CTGCGAAGCA 


CCCACAGGGA 


GAAGGAATTG 


GATGTATCGG 


ATGTTGGTAT 


TAGATTTTCT 


19680 




TTCTCCGTTC 


GAGTCTCTGA 


CTGGTGCATA 


CTTTGCAAAG 


GTGTGTTCCT 


GGCAATTGCC 


19740 




AAGAGTTAGA 


AAAATGCACC 


TTCTCTGGTG 


GCCGTTGGGG 


TGTTGTTTCA 


CAGGCAGTGG 


19800 


15 


TGACAGGGCC 


CCTTGGCTGT 


GGCTGTCTTC 


TCCAGCGCCG 


TGGATAAAGA 


GACGGGACAG 


19860 




ATTCTGTGCC 


TCTGTACGAT 


TTAGAGCGTA 


ACTGACCGCG 


TCCAACACCC 


GTTTTTCCAC 


19920 




TTACAAAGCT 


GGTGGTGCGA 


CGGGCTTGGT 


GTCTCCCGTA 


CGGGAAGGAG 


GCCTTTGGGC 


19980 


20 


CGCTCCAAAG 


ACGCCCTGTC 


GTAGGAATGG 


CCTCTCCATC 


CCGCCAAAGT 


CCAGCCAGGC 


20040 




CCCCGAAATG 


GTCCCATTTC 


CTTGGAAGCC 


TGAGTTTCTG 


TTCTGGTCTT 


GCTGCTGTCC 


20100 




TTGGCCACGT 


CAGCACGTGG 


GAGCATCTGT 


GGATACCGCA 


GAGTCTGGGG 


ACAGCTGGGC 


20160 


25 


GTTTAACCGA 


AATGAAGCCG 


AGACGGGTTT 


CAGGTTTTGG 


TGCCAAGCTC 


TGGTCAGGAT 


20220 




GAAAGGGAAA 


TACCAGAGTC 


CTCTGTCCTC 


GCCTCTGGGT 


TTCATGCTGA 


CCTTTCTAAC 


20280 




ATTTGTTTTC 


CCCTAAGAAC 


AAGCAGAAGC 


CTCCAGCTCC 


CTTTAGCTCC 


ACAGTTTTCC 


20340 


30 


CGGGGACATA 


GCGAGGATGG 


CACACGGCAG 


CCACTCCCAC 


GACACACATT 


TCGGAGGCAC 


20400 




TTTGCTGGAA 


GCCGCTTGTC 


TCCTCCAGCT 


TTGGGAGGTC 


TGGGGAGGAG 


AGAGGCTTTC 


20460 




GGTGGACACG 


TTTGACATTA 


AAAAAAAAAA 


AAAAAAAAAA 


AAAAAAACTG 


GTGCCTAATT 


20520 


35 


TATTAAAGAG 


AATTAGCTTA 


GCGAGTATAT 


GCTGATATTC 


TTCGACACAC 


GTGGGTAAGT 


20580 




TGATGCCATT 


TATAAATGTT 


TTATTGAAAT 


TTGATATTTA 


ATGAGAAGCC 


GGTTAAGGAA 


20640 




TGTAGACAAT 


ATCCCGTTTC 


AAAGCTATGA 


AATGTGCTAT 


TTATTGAAAG 


GGGATGTGGC 


20700 


40 


TTCACGAGTT 


CAGCCCATTG 


TACGTGCAGG 


TCCCGTGGGA 


AGGAGGCAAA 


AGCCCCTGCT 


20760 




TCTTACTTTG 


TGATGTATGT 


GCATTTGTTA 


TTTATTTTTT 


TTTCCTTGGT 


CGGACGTTCA 


20820 




TAAATATGTA 


CTATTTTAAT 


TATGTCGAGT 


GTAAATTTGA 


CATCGCGTTG 


CATTTATTTT 


20880 


45 


TATATTTCTG 


AAAACTGTTG 


CTTTTTCTTT 


TTCCCTCCCC 


CATTGACGAC 


ATAGCGGCCC 


20940 




CCGCGTCCGG 


GTTACAAATA 


CATCTACAGA TATTTTCAGG* GATTGCTTCA 


GATGAAAACA 


21000 




AATCACACAC 


CGTTTCCCAA 


ACCAACAGTC 


TTCACATTTC 


TATCCCTCTG 


TTATTGTCGG 


21060 


50 


CAGGCGGTGA 


GGGGTAGAAA 


AAAAACAAAC 


AAACAAACAG 


AAAAAAAAAC 


CAAAAAAAAC 


21120 




CACCCTGAGT 


TTCTCTGGTG 


ACGCCCTCAT 


TCTCCTAACG 


TTCAATAATC 


TCAATGTTGA 


21180 




GTTGCAGCAA 


CAGACTGTAT 


TTTTGTGACG 


CCCCGTAGTA 


TGAATGTACA 


TCTTGTAAAA 


21240 


55 


CTGAGATATA 


AATAAAC TTA 


TAAATATTTG 


TATTCAAGTG TTAAAAAAAA 


AAAAATTCTC 


21300 
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AACCTCTCCC CTGAGGACAG GCTTATTGGA AAAAAAAAAA AAAAAAAAAA ATCCTGAGTC 21360 

GGCCGTGGCT GAACACAGAG TGTTGTTCTG CTCCGTGCAT TTCCAGGGTG GGTACCCAGT 21420 

GTTGCCCCCC AGCCTTAGAT CGGGAGGTAC CATTGACTTT TGCTTGTATC CCATCCCCTT 21480 

CCTTTACTGA AACCTACCTC CCCGCTTCTC AGCCAACGTC CCCCCAGAAG GTGGCAAAAA 21540 

AAACAGAGGA AAAAGCCCTG ATTTGAATCA AGTCAGAGCT GCTAATTCTC CACTTTCTTT 21600 

AATTAATTAA TTTATTTTTT TTTTTGAGAC TGAGTCTCGC TCTGTCGCCC AGGCCGGAGG 21660 

AGTGCAGGGG CGCGATCTCG GCTCACCGCG ACCTCCGCCT CCCGGGTTCA AGCGACTCTC 21720 

CTGCCTCAGC CTCCCGAGTA GCTGGGATGA CAGTCACCTG CACCACCGCG CCCGGCTCAT 21780 

TTTTGTATTT TTAGTAGCAA TGGGGTTTCA CCGTGTTGGT CAGGCTGGTC TCGAACTCCT 21840 

GACCTCGTGA TCCACCCGCG TCTGGGCCCG GCCGGTGATG TGTGTGCTTT TAACTTTTAT 21900 

TTTGTTCCAG TTTTCGACAG TGGCACGGAT TTTCCAGCAC GGTCTTGCAA GGATGATTGA 21960 

GTCATTTTTG AGACAAAAAA TATAATAATA ATAAATGGAA AAAGAAATCG ACTTTTAAAA 22020 

ATGACAAATT TTTTTTTTTT TTTTTTGCAT AGATTTTTCT CTCTTTATGT AAAGGAAAGT 22080 

TCATGATTGG ATTTGGCCGG CCTGACTGCT TCCCGGCTGT GATAAAAAAC ACATGTGAGC 22140 

TGGGAGGGAA GTGGGGGAGG GACACAGCTG CCCACACAGG GTTCCCACCG CGGTTACAGG 22200 

GTGGGCAGTG CTGGGGGAGC TTTCTCTGTG GGGGGCTCAG AGCCTGAGGA CAGGTGAGCC 22260 

TCTCCGACAC CTCCCCAGTT GCCTGGAGTC TAAACCGTCC GTTGTCTGTA CCGTCCGTTC 22320 

TTCCTGCTGA CTCCTGGTAG TTCCTGAAAG CTTCTCTTGG CCAGAGAAGG GGTTTCAGAG 22380 

GCCGTGTGTC CAGGCCATTC TGCAAAGTGC AACTTGACCG TTCCTTTCCT TTTCTGGCCT 22440 

GCGTGGTCTG AAGCTCAGAG CCCTCTCTTC ACCCAGCCTG TGTGTGTCTT GCCGGACAGA 22500 

AGAAAAATGG TGCTTTTTGC GTGTTAGCAG AGGTGCTTTT CATGGCTGAC CTCAACGCGT 22560 

CCATCTCCAG CCTTGACCAA GCTGTTTTTT AGGGGCAAAC GCAGGCAAGT TCTGAATGCA 22620 

CACAGTTATT TCATGGTTAA ACTATTCAGC TTTGGCCGGG CGCAGTGTGG CTCTCACGCC 22680 

TGTCATCCCA GCACTTTGGG AGGCCGAGGC GGGTGGATCA CCTGAGGTCA GGAGTTCGAG 22740 

ACCAGCCTGG CCAACACGGT GAAACTCTAT CTC TACT AAA AATACAAAAA TTAGCCGGGC 22800 

GTGGTGGTGT GTATCTGTAA TCCCAGCTAC TCAGGAAGCT GAGGCAGGAG AATCGCTTGG 22860 

ACCCAGGAGG CGGAGGTTGC ACTGAGCCGA GATCGCGCCA TTGCACTCCA GCCTGGGCGA 22920 

CAGAGCCAGA CGCTGTCTCA AAAAAATGAA TAATAAAATA AAATAACAGG AACTAAATAA 22980 

AATAAAACGT TCAGCTTTGT TCTGCAAATC CACTCCTATT GTTTTACGTG GTTTGAGAGA 23040 

CTCTGTCCCT TAGAAATAGA TGTTTGTTGC CAATTGTAAT GAATCTGTTT CAAAAATGAA 23100 

CAGAATATTC AAATGGTTTG AGAGATCTTT TCCCTTAGAA ATAGCTTGTT GCCAATCACA 23160 

AAGAATGTTT TTCAAAAATG AATGGAATCT TCCTGGATAT CGCTTCCAGA TCTTCATTTT 23220 

TTTTGCATAG TTCAACCTGA AAAGTAAGTG TCTCAGCCCT GAATTTCTTT CTGATTTTTC 23280 

CATGGGTTGT CTTGCAGACT TCTCTGGACT TGACCACATT TAAAAAAAAA AAAATTAACT 23340 
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TTTTCACACG GACACGGTTT CAATAGGAAT GAGATCTTTG AGTTTTTATG TAACAGATTC 


23400 


TTACCATCAG 


TTCTCAGATT CCCAAATTAC 


ACACAAAAAG 


CCACGGACTT 


CGCCTCCTGC 


23460 


TAACATGTCC 


TTCTGTTTCT GAGGCTTCTG 


TTGGTGTTAG ACTTTCATGT 


TTAATAGCAG 


23520 


ACAATGTAGG 


GATTTAAAGA AAAATGCAGA 


GAAAGCAAAA ACACTGACCA 


AACACACGGA 


23580 


GATAAGCTTT 


CTAAAGCCTT 


TGTTCTTGGA 


GTTGTCGTTA AAAAAAAAAA 


GTTGTTTTAA 


23640 


ACTTTGCAAG 


CATGCCTATA 


TTGAACTCAT 


AAGCAAGAGA 


GCCAAGAAAA 


ATAGTGTCGG 


23700 


TCGTCTACTC 


TACACGTTTT 


CCCAAAACAG 


ACGTATTTTA ATTTCTTTTG 


TTTGAACTCA 


23760 


CAGATGCTGA 


GAGTTAAAAG 


TTAAATTTTT 


GTCATGAACA ATAGTGGCCA 


AAACCACAGT 


23820 


TACTTTTGCA 


CTATAGCATA ATAAGAAAAA 


TACAGGCTGG 


GCTCGGTGGC 


TCACACCTGT 


23880 


AATCAAAGCA 


CTTTTGGAGG 


CGAAACAGCC 


AGATCCCTTG AGCCCAGGAG 


ATTGAGACCA 


23940 


GCCTGGGCAA 


CATAGCGAGA 


CCCTCATCTC 


TACAAAAAAG 


GTTTGTTACA 


TATGTAACAA 


24000 


ACCTGCACAT 


TGTGCACATG 


TACCCTAAAA 


CTTAAAGTAT AATAATAAAA 


AAATTAAAAA 


24060 


AAAATTCACC 


AATCAACTGC 


CTGCTGGTGC 


CTTCAAGAGA 


CTCACCTAAC 


ACATAAGGAC 


24120 


TTGCATAAAC 


TTATAAAACA ATTCAATGGA AGAATCCTTG AAAGTATTCT 


GAGAAGACAG 


24180 


TATAATAAAC 


TGATTTCTAA AAAGGCTATA AAAAATTGAA 


TAAATCATTG 


TTGGGCATCC 


24240 


TGTGCTGAAA 


TATAATGCAG 


CCAATAAAAA 


TTACAAAATG 


AATAAACATT 


TTATAACAAT 


24300 


AAAAAAAAGT 


CAAATAATTA 


GGCAGGCATG 


GTGGTGCTCT 


CCTACGGTTG 


AAGCTATTCA 


24360 


GCAGGCAAGA 


GGATACTTTG 


TTTTTGTTTT 


TTAATTTTTT 


TTGAGACAGA 


GTCTCGCTCT 


24420 


GTTGCCAGGC 


TGGAGTGCAG 


TGGCGTGATC 


TCAGCTCACT 


GTAATTTCTG 


CCTCCCGGGT 


24480 


TCAAGCGATT 


TTCCTGCCCC 


AGCCTCCCGA 


GTAGCTGGGA 


TTACAGGTGC 


CCGCCACCAC 


24540 


ACCTGGCTAA 


TTTCTTTTGT 


ATTTTTAGTA 


GAGACGAGGT 


TTCCCCATGT 


TGGCCAGGCT 


24600 


GGTTTTGAGC 


TCCCGACCTC 


GGGTGATCCA 


CCCGCCTCAG 


CCTCCCAAAG 


TGCTGGGATG 


24660 


ACAGGCGTGA 


GCCACCGCGC 


CTGGCCCAGG 


AGGATTATTT 


GATCCCAGGA 


GGTGGAGGCT 


24720 


GCAGGAAGCC 


ATGATTGCAC 


CACTGCACTC 


CAGCCTGGCT 


GACAGAGTGA 


GACCACATCT 


24780 


CTAAATAAAT 


GAATAAATAC 


AGGCAGAAAC 


TTTTTTTGTT 


TTGTTTTGAT 


GGAGTCTTGC 


24840 


TCTGTCACCA 


GGCAGGAGTG 


CAGTGGTGCC 


ATCTCAGCTC 


ACTGCAACCT 


CCACCTCCTG 


24900 


GGTTCAAGCA 


ATCCTCCTGC 


CTCAGCCTCC 


CGAGTAGCTG 


GGATTACAGG 


TGCCCGCCAC 


24960 


CACGCCCGGC 


TAATTTTTTG 


TATGTTTAGT 


AGAGACGGGA 


TTTCACCGTG 


TTAGCCAGGA 


25020 


TGGTCTTGAT 


CTCTTGACTT 


TGTGATCTGC 


CTGCCTCAGC 


CTCCCAAAGT 


GCTGGGATTA 


25080 


CAGGCATGAG 


CCCAGGAGTT 


CAAGACCAGC 


CTCAGCAACA 


AAGTGAGACC 


TTTTCTCTCC 


25140 


AAAAAATCAA 


AAATTTAGCC 


AGCTGTGGTG 


GCTCCTGCCC 


GTGATCCCAG 


TACTGTGGGA 


25200 


GGCTGAGGCA 


GAATTGCTTG AGCCCAGGAG 


TTCGAGACCA ACCTCAGCAA 


AAAGGACTCT 


25260 




CTCTCTCTCT 


CTCTCTCTCT 


CTCTCTATAT ATATATATAT 


ATATATATAT 


25320 
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GAGTTTCAAA AATTGCTGGG TGACCAGCTC ATCTACTGGT TTTCCCCTTG GG AAAGTGAA 25380 

ATTGTCATGT ATTGAAGATT TCCAAGGAAG TTGTATTGAA TGAGAAACAA ACTCAATCTG 25440 

TTCGTGTTTA AAGAGCTGCA GTGCGTTTGC TGTGTTTCCC ATAAAACTGC ACTTCCAAAA 25500 

GACACGCTGA GAAAGGAGAC CAGGATTTGT AATTCAGAAA TTGGAAAGCA AGTTAGGCTG 25560 

GACGTGGTAG CTCATGCTTG TTGTAATCTC AGCACTCTGG GAGGCTGAGG CAGGAGGATC 25620 

ACTTGAGCCC AGGAGTTCAA GACCAGCCCG TGCCACATGG TGAAACCCTG TCTCTCCAAA 25680 

AAATAAAACA TTTAGCCAGA TGTGGTGACT CATGCCTGTA ATCCCGGTAT TCTGGGAGGC 25740 

TGAGGCAGAG TTGCTTGAGC CCAGGAGTTC AAGACCAGCC TCGGCAACAA AGTGAGACCC 25800 

TGTCTCTCCA AAAAATAAAA CATTTAGCCA GCTGTGGTGA CTCATGCCTG TAATCTCAGT 25860 

ACTCTGGGAG GCTGGGGCAG AATGGCTTGA GCCCAGGAGT TCGAGACCAA CCTCAGCAAC 25920 

AAAGTGAGAT CTTGTTTCTC CAAAAAATCA AAAATTTAGC CAGCTGTGCT GGCTCATGCC 25980 

TGTAATCCCG GTACTCTGGG AGGCTGAGGC AGAATCGTTT GAGCCCAGGA GTTCGAGACC 26040 

AACCTCAGCA ACAAAGTGAG ATCTTGTTTC TCCAAAAAAA TCAAAAATTT AGCCAGCTGT 26100 

GCTGGCTGGT GCCTGTAATC CCGGTACTCT GGGAGGCTGA GGCGGAATTG CTTGAGCCCA 26160 

GGAGTTCAAG ACCAGCCTCA GCAACAAAGT GAGATCTTGT TTCTCCAAAA AATAAAACAT 26220 

TTAGTCAGCT GTGGTGGCTC AAGCCTGTGA TCCCAGCATT TTGGGAGGCC GAGGCGGGCG 26280 

GATCACGAGG TCATGAGATC GAGACCATCC TGGCTAACAC GGTGAAACCC CGTCTCTACT 26340 

AAAAATACAA AGAAAATTAG CCGGGCGTGG TGGCGGGCGC CTGTAGTCCC AGCTACTCAG 26400 

GAGGCTGAGG CAGGAGAATG CCGTGAGCCT GGGAGGCGGA CCATGCAGTG AGTCAAGATC 26460 

GCGCCACTGC CCTCCAGCCT GGGCCACAGA GCAAGACTCC GTCTCAAAAA AAAAAAAAAA 26520 

AAAACTGCTG CCCAACCTGT GTTTGCACCA CTGCCCTCCA GCCTGGGCAA CAGAGCAAGA 26580 

CTCCGTCTCA AAAAAAAAAA AATGCTGCCC AAGCTGTGTT TGCACCACTG CCCTCCAGCC 26640 

TGGGCAACAG AGCAAGACTC CGTCTCAAAA AAAAAAAAAA AAAATGCTGC CCAAGCTGTG 26700 

TTTGCACCAC TGCCCTCCAG CCTGGGCAAC AGAGCAAGAC TCTGTCTCAA AAAAAAAAAA 26760 

AATGCTGCCC AAGCTGTGTT TGCACCACTG CCCTCCGGCC TGGGCAACAG AGCAAGACTC 26820 

CGTCTCAAAA AAAAAAAAAA AATGCTGCCC AAGCTGTGTT TGCACCACTG CCCTCCAGCC 26880 

TGGGCAACAA AGCAAGCCTC AGCTTTCTGC CATCTCCACA ACCAAGAAAG CAATTCACAC 26940 

AGAAATCAGT GCATCGTGCA GTGACCTCTT CAGAAAACCA ATGAGTTTTC CACCTGAGGA 27000 

ACTGTTTCTG AGCCCCATTC AGAAAAACAC ATCCCTGTAA CTGCAGGGCA GATTTACTCA 27060 

CTGTATGCCT GTTTAAATAA AGCTTCCAGC CTCTGCATGG GGTCTGTCTG GAAGCTCCTG 27120 

TATCTGTCCC ACATTCTTGG AATCACAATG CACCCTTGGG AGGAAGATAT GTATTTAAAG 27180 

GGAGTGGATG TTATGGTGAG AAAATGCTGC CCATCCTTCT AGAAGACAAA AGCCACACAA 27240 

AATACATCAC AAGAACCAGT TTTTTTCAGA GAAGAACCTG CACAAAGAAC CTGCTCCCCC 27300 

CACACCCCCA CACACAGGTG AATTAACAGG ATGTATGTTT TATCATAAAA GCACAGGTTT 27360 
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GTTTCCTATG 


CACTCTCTGA 


GGATTTGGCC 


ATATGCAAAG ATGTACAAAA ACCTTCTCTT 


27420 


TCCCCAGGGA ACCGTAACCC 


GTCTGAAAAG ATGCCCTTCT 


CAGAAGCGAG 


TTGAACGATT 


27480 


GTTGGAAAAG 


ATAAAATACG 


ACGTGCACAC 


ACACAGTAGA 


GAAATGTCAC 


CCATGCAAAT 


27540 


TATGTGTTTG 


AATGGAACAC 


ATTCAGGAAG 


CTAAATGGGG 


TATGACCACA 


CATTTGGGTT 


27600 


GATTTATTTG 


ACGAGTGGAA 


GGGGCAGATG 


GAAATGAATA 


CTGCTGTTTT 


CCTTTGGAAG 


27660 


GCCATATATG 


GGAATACCAA 


GAGGATTACT 


TTGGAAGTTT 


AGCTTCTCCA 


GGTGGTCTCT 


27720 


CTCTCTCTCT 


CTTTTTTTGA 


GACAGAGTCT CACTCTGTCA 


CCCAGGCTGC 


AGTGCAATGG 


27780 


CGTGCTCTCG 


GCTCACTGCA 


ACCTCAGCCT 


CCCAGGTACA 


AGCGATTCTC 


CTGCCTCAGC 


27840 


CTCCCGAGTA 


GCTGGGATCA 


CAGGTGTGCA 


CCACCACGCC 


TGGCTAATGT TTGTATTTTC 


27900 


AGTAGAGATG 


AGGTTTTACC 


ATGTTGGCCA GGCTGGTCTT 


GAACTCCTGA 


CCTCAGGTGA 


27960 


TCCGCCTGCC 


TCGGCCTCCC 


AAAGTGCTGG GATGACAGAC 


ATGAGCTAGC 


ACGCCCGGCC 


28020 


CCAGGTGGTC 


TTTTTAGCGG 


GTATTAAAGC 


AGCTTTCTCT 


CTGAGCCTTA AACCATGAAG 


28080 


ATAGACAGAC 


TCAGTGTATG 


GGTTTTAGAG TTGTAATTTT 


ATAAAAATAA 


GAAAAAGTCG 


28140 


ACCTATCATT 


GATGGTTAGT 


ATTTTTTGTA 


GCAGTTGCAT 


GCAATATTAG 


GATAAGGCAT 


28200 


GTTCTCAAAA 


AGAACTCTTT 


TTTTTTTTTT 


TTTGAGACGG 


AGTCTCGCTC 


TGTCACCCAG 


28260 


GCTGGAGTGC 


AGTGGCACGA 


TCTCCGCTCA 


CTGCAAGCTC 


CTCTTCCCGG 


GTTCACGCCA 


28320 


TTCTCCTGCC 


TCAGCCTCCC 


CAGTAGCTGG 


GACTACAGGC 


GCCCGCCACC 


ACGCCCGGCT 


28380 


AATTTTTTGT 


ATTTTTAGTA 


GAGACGGGGT 


TTCACCATGT 


TAGCCAGGAA 


GGTCTCGATC 


28440 


TCCTGACCTC 


ATGATCCGTC 


CGCCTCAGCC 


TCCCAAAGTG 


CTGGGACTAC 


AGGCGTGAGC 


28500 


CACTGCACTT 


GGCCTTTTTT 


TTTTTTTAGA 


TGGAGTTTTG 


CTCTTGTCGC 


CCAGGCTGGA 


28560 


GTATAATGGC 


ATGATCTCGA 


CTCACTGCAA 


CCTCCGCCTC 


CCGAGTTCAA 


GCGATTCTCC 


28620 


TGCCTCAGCC 


TCCCGAGTAG 


CTGGGATTAC 


AGGTGCCCAC 


CACCATGTCA 


AGATAATGTT 


28680 


TGTATTTTCA 


GTAGAGATGG 


GGTTTGACCA 


TGTTGGCCAG 


GCTGGTCTCG 


AACTCCTGAC 


28740 


CTCAGGTGAT 


CCACCCGCCT 


TAGCCTCCCA 


AAGTGCTGGG 


ATGACAGGCG 


TGAGCCCCTG 


28800 


CGCCCGGCCT 


TTGTAACTTT 


ATTTTTAATT 


TTTTTTTTTT 


TTTAAGAAAG 


ACAGAGTCTT 


28860 


GCTCTGTCAC 


CCAGGCTGGA 


GCACACTGGT 


GCGATCATAG 


CTCACTGCAG 


CCTCAAACTC 


28920 


CTGGGCTCAA 


GCAATCCTCC 


CACCTCAGCC 


TCCTGAGTAG 


CTGGGACTAC 


AGGCACCCAC 


28980 


CACCACACCC 


AGCTAATTTT 


TTTGATTTTT 


ACTAGAGACG 


GGATCTTGCT 


TTGCTGCTGA 


29040 


GGCTGGTCTT 


GAGCTCCTGA 


GCTCCAAAGA 


TCCTCTCACC 


TCCACCTCCC 


AAAGTGTTAG 


29100 


AATTACAAGC 


ATGAACCACT 


GCCCGTGGTC 


TCCAAAAAAA 


GGACTGTTAC 


GTGGATGTTC 


29160 


TAGCTTCCTG 


TTCTCGTCTT 


TTCTTTGTTA ATTGTACAGT 


TTGAGGGTGT GTGTGCGTGT 


29220 


GCGCACGTGT 


GTGTGTGCAG 


TCTCCTGATT TCATGTATTT AATTGTTATT ACCACCACCT 


29280 


CCATCTCTCA 


TTCCTTCTTA 


CCCTCACTGT GTAAAGATAC 


ATGTTGTTTT 


TAAATTTTAT 


29340 
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GTATTTATAT TTATTTATTT GTATTTCTGA GACAGAGTCT 
TGGCATGATC TCAGCTCACA GCAACCTTTG CCTCCTGGGT 
AGCCTCCCGA GTAGCTGAGA TTACAGGCAC ACACCACCAC 
AGACGGAGTC TCGCTCTGTT GCAGGCTGCA GTGCAGTGGC 
CCTCTGCCTC CTGGATTCAA GCGATTCTCC TGCCTCAGCC 
AGGCGCCCAC CGCCACACCT GGCTAATTTT TTATTGGTAG 
GTTGACCAGA CTGGTCTTGA ACTCCCAACC TCGGGTGATC 
AGTGCTGGGA TGACAGGCGA GGGCCACCGC GTCCAGCCTT 
TTTTTTTAAG ATGGAGTTTC ACTCTGTTGC CCAGGCTGGA 
CTCCCTGCAA CCTCCACCTC CCAGGTTCAA GAAATTCTTT 
CTGGGACTAC AGGTGCCCGC CACCACACCC ACCTAATGTT 
GGGGCTTCAC CACATTGGCC AGGCTGGTCT TGAACTCCTG 
CTCAGCCTCC CAGAGTGTTG GGATTACAGG CGTGAGCCAC 
GTCTTAGGAA ATCAGAAAGT GGGTAGTTTC CGCACTCTGA 
CGAAGAGAAA GGAGAGTGAA AGGATGTCTC CTCTTGTCTG 
GTGAGCCAAT TGCCAGAAAC TGAGGGTGCT TCATTTGGCC 
TGTCTAAGTA CTTGTTAATG CTGAGAAGCT CTCCAAGCTA 
CAGAGCACGA CCTTGTCTGA AAACAATTAA TTAATCAATT 
ACTGAACTCA GGAGACCATT GGGGTGGGCA GGGCTGGGGT 
GGTGCAATGG . ACTTTGCTCC AGTCTCCCTC CCCATCTCTT 
GGAGCATGGG GAAGATGCTT TGGGAATCTG TAACTTCTTG 
AGTAATTGTT AATGCTGAGA AGTTATAGAT TTCCAAAGCC 
GGGTCATCGG TTACTCAGTG TTACAGAAAG AATGACATGG 
GGAACCATGA GGGGCCAGAG TATTTTACTC TAAGTGTAGA 
TCCCAACACC ACCAATGGTG GCACCTAACT TTTGTGTTTG 
TTTCTGACGT AAATGCAAGT GATATTCCTT GGAAACCATG 
ACTACTAGTG ATACCCTGTA GCTCACCTAC AGCAGCTCAC 
CTCAGGTATA GCTCACCTGC AGCGGCTCAC CTGTAGCTCA 
GCTCACTGGT AGCTCACCTG CAGCAGCTCA CCTGTACCTC 
AGCTCACCTG TAGCTCACCT GTACGTGAGC CACCGTACCC 
TAAAATAAAT ACACAAAAAT TAGCCGGACG CGGTGGCGCG 
CAGGAGGCTG AGGTGGGAGG ATTGCTGGAG GCTGGGAGGT 
ATCCAGCCAC TGTACTCTAG CCTGGATGAC ATAGCAAAAC 
CAAAAAACAA AACAAAGAAA CAAACAAAAA ACCCACACAC 



CACTCTGTTG CCCAGGCTAG 29400 

TCAAGCGATT CTCCTGCCTC 29460 

ACCCGGCTAG TTTTGTTTTG 29520 

GTGATCCTGG CTCACTGCAA 29580 

TCCCAAGTAG CTGGGATTAC 29640 

TAGAGACGGG GTTTCTCCAT 29700 

CACCCACCTG GGCCTCCCAA 29760 

CTTCTTCTTC TTCTTTTTTT 29820 

GTGCAGTGGT GCAATCTCGG 29880 

TGCCTCAGCC TCCCGAGTAG 29940 

TGTATTTTTT TGGTAGAGAC 30000 

ACTTCAGATG ATCCTCCTGC 30060 

GGTGCCCGGC CAGACGTCAT 30120 

GGAGAAAAAG AGACGTCCGG 30180 

TAGCCTGTTC TCAATCGTGA 30240 

AGGCAAGCTT CTCAACAGAA 30300 

CTGCACTCCA GCCTGGGTGA 30360 

AATTAATATA ATGAAATCAT 30420 

TGGAAAGGAA CATAAAATAT 30480 

CTCGCCAAGA GTCTCTGGAG 30540 

TCTTGTAAAC AGAATATCTA 30600 

TTTCTCCAGG CTACGGACAA 30660 

AGATGTTTGT TACATCTTAA 30720 

TGGTACATTG GCCACGCCTG 30780 

TGCCCCACAT TTCTTCTTCT 30840 

CTGCAGCAAG AGGCCATCTG 30900 

TTGAAGCAGC TCACCCATAG 30960 

CGTGTAGCTC ACTTGTAGCA 31020 

ACCTGTACCT CACCTGCAGC 31080 

GGCCAGCAAG ACCCCATTTC 31140 

TGTCTGTAGT TGTAGCTACT 31200 

AGAGGCTGCA GTGAACCGTG 31260 

CTTGTCTCAA AAAACAAAAA 31320 

ACCGGAAAAC AAAACAAAAA 31380 
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GCAAAAAGGA AAGAAAAGAG AGCCAGGTCC CAAATATATA TTTCCTTGGA GAACCATTTG 
CAAAGAGCAC ACTTAAGGCC GGGCGCGGTG GCTCACGCCT GTCATCCCGG CACTTTGGGA 
GGCCGAGGTG GGTGGATCAC GAGGTTGGGA GATCGAGACC ATCCTGGCCA ACATGGCGAA 
ACCCCATCTC TACTAAAAAT ACAAAAAATC AGCCAGGTGC TGAGGCAGGT GCCTGTAGTC 
CCAGCCACTC AGGAGGCTGA GGCAGGAGAA TGGCATGAAC CTGGGAGGTG GAGGTTGCAG 
TGAGCCGAGA TCGCGCCCCT GCACTCCAGC CTGGGCGACA GAGCGAGACT CCTTCTCAAA 
TAAATAAATA AATAAATAAC AAAGAGCAAA CTTAAAATTG TCTCAGAAAT CCCACGGGAT 
ATTGGATCTC CCTCATGCCT ATCTGATGAC ACTTTGAGTG TCTGGGGCCC CGTGCCTATT 
TTCTGGGGTT CCCAGAAGCT GCCGTTCTGA AAGTGTGGCT CTCGGGGACG TGGCACAGGT 
GTGGATGTCT GTTTTAAATG TCAGGCGTTT GGACGTTGAG GAACGTGAGG CTGAAGGTCG 
CCTTCGCCGA CCCCCTGAGT TTAGGGTCCT GCCTTTTAAA ATCTTCCCAG CACTCTGTTG 
TTCACGCAAG CGTCCCATCT GTTTGGGTGG CCGTGCCGTC TGCATCTGTC TCGAACCTTC 
ACAGCTTTGC AGAATATCCT GTTTCTCAAT ACGGATGGAG AAACACGAGA CGCGTTTTCT 
GGGTTATTTT AGCCGTCACG GAGAACCCCA GACTCATGTG TGCTAATGAC CTCATTAATG 
ATACTCTGAG GCAGACAGCC CTGCCTGATC TTAACAACAT TTTTTAAATT TCTTTTTTTG 
TTGTTGTTGT TACAGCATCA TTCATATAAC GTAGGAAACC GTGATCAGTA GCTTTTAGGA 
TATTTGCAAC AGGGTGTAAC ADAAABD 
(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 806 base pairs 

(B) TYPE: nucleic acid 
(C> STRANDEDNESS : single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(A) DESCRIPTION: /desc = "SHOT" 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 43. .615 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 
GTGTCCCCGG AGCTGAAAGA TCGCAAAGAG GATGCGAAAG GGATGGAGGA CGAAGGCCAG 
ACCAAAATCA AGCAGAGGCG AAGTCGGACC AATTTCACCC TGGAACAACT CAATGAGCTG 
GAGAGGCTTT TTGACGAGAC CCACTATCCC GACGCCTTCA TGCGAGAGGA ACTGAGCCAG 
CGACTGGGCC TGTCGGAGGC CCGAGTGCAG GTTTGGTTTC AAAATCGAAG AGCTAAATGT 
AGAAAACAAG AAAATCAACT CCATAAAGGT GTTCTCATAG GGGCCGCCAG CCAGTTTGAA 
GCTTCTAGAG TCGCACCTTA TGTCAACGTA GGTGCTTTAA GGATGCCATT TCAGCAGGTT 
CAGGCGCAGC TGCAGCTGGA CAGCGCTGTG GCGCACGCGC ACCACCACCT GCATCCGCAC 
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CTGGCCGCGC ACGCGCCCTA CATGATGTTC CCAGCACCGC CCTTCGGACT GCCGCTCGCC 480 

ACGCTGGCCG CGGATTCGGC TTCCGCCGCC TCGGTAGTGG CGGCCGCAGC AGCCGCCAAG 540 

ACCACCAGCA AGGACTCCAG CATCGCCGAT CTCAGACTGA AAGCCAAAAA GCACGCCGCA 600 

GCCCTGGGTC TGTGACVCCA ACGCCAGCAC CAATGTCGCG CCTGTCCCGC GGCACTCAGC 660 

CTGCASNCCC TNDDKANMCG TTRCTYHTCM ATTACACTTT GGGACCYCGG GDBAGVCCTT 720 

TTNNAGACTT YVATKGGSCW CSCTGGBCCC TBRKGAWAC TTGSGHYCGR GAACCGAKHT 780 

GCCCABAYGA GGACCRGTTT GGAKDG 806 
(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 190 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

Met Glu Asp Glu Gly Gin Thr Lys lie Lys Gin Arg Arg Ser Arg Thr 
15 10 15 

Asn Phe Thr Leu Glu Gin Leu Asn Glu Leu Glu Arg Leu Phe Asp Glu 
20 25 30 

Thr His Tyr Pro Asp Ala Phe Met Arg Glu Glu Leu Ser Gin Arg Leu 
35 40 45 

Gly Leu Ser Glu Ala Arg Val Gin Val Trp Phe Gin Asn Arg Arg Ala 
50 55 60 

Lys Cys Arg Lys Gin Glu Asn Gin Leu His Lys Gly Val Leu lie Gly 
65 70 75 80 

Ala Ala Ser Gin Phe Glu Ala Cys Arg Val Ala Pro Tyr Val Asn Val 
85 90 95 

Gly Ala Leu Arg Met Pro Phe Gin Gin Val Gin Ala Gin Leu Gin Leu 
100 105 110 

Asp Ser Ala Val Ala His Ala His His His Leu His Pro His Leu Ala 
115 120 125 

Ala His Ala Pro Tyr Met Met Phe Pro Ala Pro Pro Phe Gly Leu Pro 
130 135 140 

Leu Ala Thr Leu Ala Ala Asp Ser Ala Ser Ala Ala Ser Val Val Ala 
145 150 155 160 

Ala Ala Ala Ala Ala Lys Thr Thr Ser Lys Asp Ser Ser He Ala Asp 
165 170 175 

Leu Arg Leu Lys Ala Lys Lys His Ala Ala Ala Leu Gly Leu 
180 185 190 
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Claims 

1 . A pharmaceutical composition comprising a protein having regulating activity on human growth, whereby the pro- 
tein is encoded by a nucleic acid molecule comprising the nucleotide sequence SHOX ET93 [SEQ ID NO: 2] and 

5 a nucleotide sequence selected from the group consisting of SHOX G310 [SEQ ID NO: 3], SHOX ET45 [SEQ ID 

NO: 4], SHOX G108 [SEQ ID NO: 5], SHOX Va [SEQ ID NO: 6] and SHOX Vb [SEQ ID NO: 7] or the nucleotide 
sequence of SHOT [SEQ ID No.15]. 

2. A pharmaceutical composition according to claim 1 comprising a protein having the amino acid sequence of SHOXa 
iO [SEQ ID NO: 11]. 

3. A pharmaceutical composition according to claim 1 comprising a protein having the amino acid sequence of SHOXb 
[SEQ ID NO: 13]. 

15 4. A pharmaceutical composition according to claim 1 comprising a protein having the amino acid sequence of SHOT 
[SEQ ID NO: 16] 

5. Use of a protein having regulating activity on human growth, whereby the protein is encoded by a nucleic acid 
molecule comprising the nucleotide sequence SHOX ET93 [SEQ ID NO: 2] and a nucleotide sequence selected 

20 from the group consisting of SHOX G31 0 [SEQ ID NO: 3], SHOX ET45 [SEQ ID NO: 4], SHOX G1 08 [SEQ ID NO: 

5], SHOX Va [SEQ ID NO: 6] and SHOX Vb [SEQ ID NO: 7], or the nucleotide sequence of SHOT [SEQ ID No. 
1 5] the preparation of a pharmaceutical composition for the treatment of short stature. 

6. Use of a protein according to claim 5, the protein having the amino acid sequence of SHOXa [SEQ ID NO: 11]. 

25 

7. Use of a protein according to claim 5, the protein having the amino acid sequence of SHOXb [SEQ ID NO: 13]. 

8. Use of a protein according to claim 5, the protein having the amino acid sequence of SHOT [SEQ ID NO: 16]. 

30 9. a method for the preparation of a medicament for the in vivo treatment of human growth disorders related to a 
genetic defect in the SHOX or SHOT gene by gene therapy, said SHOX gene having the partial nucleotide sequence 
as given in [SEQ. ID NO. 8] or having the nucleotide sequence given in [SEQ. ID NO. 14] and said SHOT gene 
having the nucleotide sequence given in [SEQ. ID NO. 15], the method comprising introducing into an isolated 
human cell an expression plasmid in which a nucleic acid molecule is incorporated downstream from the expression 

35 promotor that effects expression in a human host cell, said nucleic acid molecule comprising the nucleotide se- 

quence SHOX ET93 [SEQ ID NO: 2] and a nucleotide sequence selected from the group consisting of SHOX G31 0 
[SEQ ID NO: 3], SHOX ET45 [SEQ ID NO: 4], SHOX G108 [SEQ ID NO: 5], SHOX Va [SEQ ID NO: 6] and SHOX 
Vb [SEQ ID NO: 7], or said nucleotide sequence having the seqencence SHOT [SEQ ID NO: 15]. 

40 10. A method according to claim 9 whereby the nucleic acid molecule encodes a protein having the amino acid se- 
quence of SHOXa [SEQ ID NO: 11]. 

11. A method according to claim 9 whereby the nucleic acid molecule encodes a protein having the amino acid se- 
quence of SHOXb [SEQ ID NO: 13]. 

45 

12. A method according to claim 9 whereby the nucleic acid molecule encodes a protein having the amino acid se- 
quence of SHOT [SEQ ID NO: 16]. 

13. Use of a human growth protein for the preparation of medicaments for the treatment of patients being suspected 
so of having a genetic defect in the human growth gene SHOX, said SHOX gene having the partial nucleotide se- 
quence as given in [SEQ ID NO: 8]. 

14. Use of a human growth protein for the preparation of medicaments for the treatment of patients being suspected 
of having a genetic defect in the human growth gene SHOX, said SHOX gene having the nucleotide sequence as 

55 given in [SEQ. ID NO. 14]. 

15. Use of a human growth protein for the preparation of medicaments for the treatment of patients being suspected 
of having a genetic defect in the SHOT gene, said SHOT gene having the nucleotide sequence as given in [SEQ. 
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ID NO. 15]. 

16. Use of a human growth protein according to claims 13-15, said patients being identified of having a genetic defect 
in the human growth gene SHOX or the SHOT gene using a nucleic acid molecule capable of hybridizing to the 

5 SHOX or SHOT gene under the following stringent hybridization conditions: 0.5 M NaPi pH 7.2, 7 % SDS and 1 

mM EDTA at 65 °C followed by a wash in 40 mM NaPi and 1 % SDS at 65 °C. 

17. Use of a human growth protein for the preparation of medicaments for the treatment of short stature in a human 
subject having a genetic defect in the SHOX or SHOT gene [SEQ. ID. NO. 15], said SHOX gene having the partial 

10 nucleotide sequence as given in [SEQ. ID NO. 8] or having the nucleotide sequence given in [SEQ. ID NO. 14], 

said human subject being identified by a method comprising determining said genetic defect in a biological sample 
isolated from said human subject being suspective of having a genetic defect in the SHOX or SHOT gene. 

18. Use of a human growth protein according to any of claims 13-17 wherein the genetic mutation is caused by a hot 
is spot of mutation in the nucleic acid sequence encoding a protein truncation at amino acid position 1 95 in the SHOX 

gene, said SHOX gene having the partial nucleotide sequence as given in [SEQ. ID NO. 8] or having the nucleotide 
sequence given in [SEQ. ID NO. 14]. 

19. Use of a human growth protein according to claims 13-18 wherein the human growth protein is human growth 
20 hormone. 

20. Use according to claim 19 with the proviso that the preparation of medicaments for the treatment of patients suf- 
fering from Turners Syndrome is excluded. 
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SHOXa 




1 GTGATCCACCCGCGCGCACGGGCCGTCCTCTCCGCGCGGGGAGACGCGCGCATCCACCAG 

61 CCCCGGC7GCTCGCCAGCCCCGGCCCCAGCCATGGAAGAGCTCACGGCTTTTGTATCCAA 

I MEELTAFVSK 
121 GTCTTTTGACCAGAAAAGCAAGGACGGTAACGGCGGAGGCGGAGGCGGCGGAGGTAAGAA 

II S FDQKSKD GNGGGGGGGGKK 
181 GGATTCCA7TACGTACCGGGAAGTTTTGGAGAGCGGACTGGCGCGCTCCCGGGAGCTGGG 
3: DSITYREV LESGLARSRELG 
241 GACGTCGGATTCCAGCCTCGAGGAC ATCACGGAGGGCGGCGGCCACTGCCCGGTGC ATTT 
51 TSDSSLQD ITEGGGHCPVHL 

301 gttcaaggaccacgtagac aatgac aaggagaaactgaaagaattcggcaccgcgagagt 

?; fkdhvdndkeklkefgtar-v 

361 ggcagaacfggatttatgaatgcaaagagaagcgcgaggacgtgaagtcggaggacgagga 

9; aeg x y e c kekredvkseded 

421 cgggcagaccaagctgaapEagaggcgcagccgcaccaacttcacgctggagcagctgaa 

::i G - Q T K L K \ Q -R--R VS . . R T N F T L :-Q ... h N 

4 81 C GAGCTC GAGC GACTTTTTGACG AGACCCATTACCCCGACGCCTTCATGCGCGAGGAGCT 

121 ELERLFD E i-T^If^Y P • D A F ' M v.R ...E • E "L 

541 CAGCCAGCGCCTGGGGCTTTCCGAGGCGOGCGTGCACfGTTTGGTTCCAGAACCGGAGAGC 

151 SQRLGLSE A -R V Q VW F QN R R*A 

6 01 CAAGTGCCGCAAACAAGA^ATCAGATGCATAAAdfeCGTCATCTTGGGCACAGCCAACCA 

171 K C R K Q E 1 NQ MHKGVI LGTAN H 

661 CCTAGACGCCTGCCGAGTGGCACCCTACGTCAACATGGGAGCCTTACGGATGCCTTTCCA 

191 L D A C~R V A P YVNMGA L RM P F Q 

721 ACA(fGTCCAGGC7CAGCTGCAGCTGGAAGGCGTGGCCCACGCGCACCCGCACCTGCACCC 

211 QVQAQLQLEGVAHAHPHLHP 

781 GCACCTGGCGGCGCACGCGCCCTACCTGATGTTCCCCCCGCCGCCCTTCGGGCTGCCCAT 

221 HLAAKAPY LMFPPPPFGLP I 

841 CGCGTCGCTGGCCGAGTCCGCCTCGGCCGCCGCCGTGGTCGCCGCCGCCGCCAAAAGCAA 

251 ALSAESAS AAAVVAAAAKS N 

901 CAGCAAGAA7TCCAGCATCGCCGACCTGCGGCTCAAGGCGCGGAAGCACGCGGAGGCCCT 

2-1 SKM5 SIADLRLKARKHAEAL 

961 GGGGCTCTGACCCGCCGCGCAGCCCCCCGCGCGCCCGGACTCCCGGGCTCCGCGCACCCC 

G L * 

1021 GCCTGCACCGCGCG7CCTGCAC7CAACCCCGCCTGGAGCTCCT7CCGCGGCCACCGTGCT 

1081 CCGGGCACCGCGGGAGCTCCTGCAAGAGGCCTGAGGAGGGAGGCTCCCGGGACCGTCCAC 

1141 GCACGACCCAGCCAGACCC7CGCGGAGATGGTGCAGAAGGCGGAGCGGGTGAGCGGCCGT 

1201 GCGTCCAGCCCGGGCCTCTCCAAGGCTGCCCGTGCGTCCTGGGACCCTGGAGAAGGGTAA 

1261 ACCCCCGCCTGGCTGCGTC7TCCTCTGCTATACCCTATGCATGCGGTTAACTACACACGT 

1321 TTGGAAGATCC77AGAGTC7ATTGAAACTGCAAAGATCCCGGAGCTGGTCTCCGATGAAA 

1381 ATGCCATTTCT7CGT7GCC.AACGATTTTCTTTACTACCATGCTCCTTCCTTCATCCCGAG 

1441 AGGCTGCGGAAC 3GG7GTGGATTTGAATGTGGACTTCGGAATCCCAGGAGGCAGGGGCCG 

1501 GGCTCTCCTCCACCGCTCCCCCGGAGCCTCCCAGGCAGCAATAAGGAAATAGTTCTCTGG 

1561 CTGAGGCTGAGGACG73AACCGCGGGCTTTGGAAAGGGAGGGGAGGGAGACCCGAACCTC 

1521 CCACGTTGGGAC7CCCACGTTCCGGGGACCTGAATGAGGACCGACTTTATAACTTTTCCA 

1681 GTGTTTGATTCCCA.AATTGGGTCTGGTTTTGTTTTGGATTGGTA.TTTTTTTTTTTTTTTT 

1741 TTTTTGCTGTGT7ACAGGA7TCAGACGCAAAAGACTTGCATAAGAGACGGACGCGTGGTT 

1801 GCAAGGTGTCA7AC7GATA7GC AGC ATTAACTTTACTGACATGGAGTGAAGTGCAATATT 

1941 ATAAAT AT7 A7 A G A77 AAAAAAAAAATAGC [A] n 
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1 GTGATCCACCCGCGCGCACGGGCCGTCCTCTCCGCGCGGGGAGACGCGCGCATCCACCAG 

61 CCCCGGCTGCTCGCCAGCCCCGGCCCCAGCCATGGAAGAGCTCACGGCTTTTGTATCCAA 

. MEELTAFVSK 

121 GTCTTTTGACCAGAAAAGCAAGGACGGTAACGGCGGAGGCGGAGGCGGCGGAGGTAAGAA 

u SFDQKSKDGNGGGGGGGGKK 

181 GGATTCC ATT ACGTACCGGGAAGTTTTGGAGAGCGGACTGGCGCGCTCCCGGGAGCTGGG 

31 osiTYREVLESGLARSRELG 

241 GACGTCGGATTCCAGCCTCCAGGACATCACGGAGGGCGGCGGCCACTGCCCGGTGCATTT 

5l TSD SSLQDITEGGGHCPVHL 

301 GTTCAAGGACCACGTAGACAATGACAAGGAGAAACTGAAAGAATTCGGCACCGCGAGAGT 

71 FKD HVDNDKEKLKEFGTARV 

361 GGCAGAA(*GGATTTATGAATGCAAAGAGAAGCGCGAGGACGTGAAGTCGGAGGACGAGGA 

91 AEGIYECKEKREDVKSEDED 

421 C GGGC AG AC C AAGC TGAAAEAGAGGCGCAGCCGCACCAACTTCACGCTGGAGCAGCTGAA 

m G Q T K L KlQ R R S R T -N- • F T L •.. E -Q L N 

481 CGAGCTCGAGCGACTTTTTGACGAGACCCATTACCCCGACGCCTTCATGCGCGAGGAGCT 

, 3 1 E L . E - R L F rD - E 1 T H Y -P D A F M R E E L 

541 CAGCCAGCGCCTGGGGCTTTCCGAGGCGCGCGTGCAGGTTTGGTTCCAGAACCGGAGAGC 

: 5 i S Q R L G _i_S EARVQVWFQNRRA 

601 CAAGTGCCGCAAACAAGAGAATCAGATGCATAAAGGCGTCATCTTGGGCACAGCCAACCA 

• 71 K C R K Q E > QMHKGVI LGTANH 

661 CCTAGACGCCTGCCGAGT3GCACCCTACGTCAACATGGGAGCCTTACGGATGCCTTTCCA 

igl LDA CRVA PYVNMGALRMP FQ 

721 ACAdXTGGAGTTTTGCTCTTGTCGCCCAGGCTGGAGTATAATGGCATGATCTCGACTCAC 

211 QMEFCSCRPGWSIMA* 

781 TGCAACCTCCGCCTCCCGAGTTCAAGCGATTCTCCTGCCTCAGCCTCCCGAGTAGCTGGG 

841 ATTACAGGTGCCC ACC ACCATGTC AAGATAATGTTTGTATTTTC AGTAGAGATGGGGTTT 

901 GACC ATGTTGGCC AGGCTGGTCTCGAACTCCTGACCTC AGGTGATCC ACCCGCCTTAGCC 

961 TCCCAAAGTGCTGGGATGACAGGCGTGAGCCCCTGCGCCCGGCCTTTGTAACTTTATTTT 

1021 TAATTTTTTTTTTTTTTTAAGAAAGACAGAGTCTTGCTCTGTCACCCAGGCTGGAGCACA 

1081 CTGGTGCGATCATA3CTCACTGCAGCCTCAAACTCCTGGGCTCAAGCAATCCTCCCACCT 

1141 CAGCCTCCTGAGTA3CTG3GACTACAGGCACCCACCACCACACCCAGCTAATTTTTTTGA 

1201 TTTTTACTAGAGACGGGATCTTGCTTTGCTGCTGAGGCTGGTCTTGAGCTCCTGAGCTCC 

1261 AAAGATCCTCTCACCTCCACCTCCCAAAGTGTTAGAATTACAAGCATGAACCACTGCCCG 

1321 TG GTC TC CAAAAA.-_\GGA ~ TGTTAC GTGG [A] r . 
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% * 



GTGTCCCCGGAGCTGAAAGATCGCAAAGAGGATGCGAAAGGGATGGAGGACGAAGGCCAG 

M E D £ G Q 



ACCAAAATCAAG^ A^GGCGA A,GTCGGACCAATTT CACC CTGGAACAACTCAATGAGCTG 
T K I K k^^^^MIIS^^^^^^^^SJ^^^M 



GAGAGGCTTTTTGACGAGACCCACTATCCCGACGCCTTCATGCGA(^GG^CT GAGCCAG 

CGACTGGGCCTGTCGGAGGCCCGAGTGCAcfGTTTGGTTTCAAAATCGAAGAGCTAAATGT 

\TCAACTCCATAAA<^TGTTCTCATAGGGGCCGCCAGCCAGTTTGAA 
QLHKGVLIGAASQFE 
• • • • 

GCTT GTAGAGT CGCACCTTATGTCAACGTAGGTG CTTTAAGGATG CCATTTCAGCAGGAT 
ACRVA P YVNVGA LRM P F Q Q D 

AGTCZATTGCAACGTGACGCCCTTGCCCTTTCA^TTCAGGCGCAGCTGCAGCTGGACAGC 
SHCNVT PLPFQVQAQ LQLDS 

GCTGTGGCGCACGCGCACCACCACCTGCATCCGCACCTGGCCGCGCACGCGCCCTACATG 
AVAHAHHHLHPHLAAHAPYM 

• • ■ • * • 
ATGTTCCCAGCACCGCCCTTCGGACTGCCGCTCGCCACGCTGGCCGCGGATTCGGCTTCC 
MFPAPP FGLP LATLAADSA5 

• ■ • « » • 
GCCGCCTCGGTAGTGGCGGCCGCAGCAGCCGCCAAGACCACCAGCAAGGACTCCAGCATC 
AAS VVAAAAAAK T T S KD SSI 

• • • • • • 
GCCGATCTCAGACTGAAAGCCAAAAAGCACGCCGCAGCCCTGGGTCTGTGACGCCAACGC 
ADLRLKAKKHA A A L G L ♦ 

• • • • * » 
CAGCACCAATGTCGCGCCTGTCCCGCGGCACTCAGCCTGCACGCCCTCCGCGCCCCGCTG 
CTTCTCCGTTACCCCTTTGAGACCTCGGGAGCCGGCCCTCTTCCCGCCTCACTGACCATC 
CCTCGTCCCCTATCGCATCTTGGACTCGGAAAGCCAGACTCCACGCAGGACCAGGGATCT 
CACGAGGCACGCAGGCTCCGTGGCTCCTGCCCGTTTTCCTACTCGAGGGCCTAGAATTGG 
GTTTTGTAGGAGCGGGTTTGGGGGAGTCTGGAGAGAGACTGGACAGGGTAGTGCTGGAAC 
CGCGGAGTTTGGCTCACCGCAAAGCTACAACGATGGACTCTTGCATAGAAAAAAAAAATC 
TT GTTAACAATGAAAAAAT GAGCAAACAAAAAAAT C GAAAGACAAACGGGAGAGAAAAAG 
AGGAAGGCAACTTATTTCTTAACTGCTATTTGGCAGAAGCTGAAATTGGAGAACCAAGGA 
GCAAAAACAAATTTTAAAATTAAAGTATTTTATACATTTAAAAATATGGAAAAACAACCC 
AGACGATT CTCGAGAGACTGGGGGGAGTTACCAACTTAAATGTGT GTTTTAAAAAATGCG 
CTAAGAAGGCAAAGCAGAAAGAAGAGGTATACTTATTTAAAAAACTAAGAT GAAAAAAGT 
GCGCAGGTGGGAAGTTCACAGGTTTTGAAACTGACCTTTTTCTGCGAAGTTCACGTTAAT 
ACGAGAAATTTGATGAGAGAGGCGGGCCTCCTTTTACGTTGAATCAGATGCTTTGAGTTT 
AAACC CAC CAT GTAT GGAAGAGCAAGAAAAGAGAAAATATTAAAAC GAGGAGAGAGAAAA 
ATAAT GGCAAAACT GTCTGGACTGCTGACAGTAAATTCCGGTTT GCATGGAAAAAAAAAA 
AAAAAAAAAAAAAAAAA 
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1. Claims: 1-12 totally and 13-18 partially 

Use of SHOXa, SHOXb or SHOT and nucleic acid encoding them 
in the preparation of pharmaceutical compositions. Gene 
therapy. 



2. Claims: 13-18 partially and 19-20 totally 

Use of a human growth hormone for the preparation of 
medicaments for the treatment of patients having a genetic 
defect in the SHOX or SHOT gene. 
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